<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Harrison Guo</title>
    <description>The latest articles on Forem by Harrison Guo (@harrison_guo_e01b4c8793a0).</description>
    <link>https://forem.com/harrison_guo_e01b4c8793a0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809272%2Ff7da2c77-d1e2-4b04-8cf4-11c5f274f605.png</url>
      <title>Forem: Harrison Guo</title>
      <link>https://forem.com/harrison_guo_e01b4c8793a0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/harrison_guo_e01b4c8793a0"/>
    <language>en</language>
    <item>
      <title>Testing Real-World Go Backends Isn't What Many People Think</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Sat, 18 Apr 2026 00:18:33 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/testing-real-world-go-backends-isnt-what-many-people-think-12nl</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/testing-real-world-go-backends-isnt-what-many-people-think-12nl</guid>
      <description>&lt;p&gt;I've reviewed enough Go backend test suites to notice a pattern. The services with the most unit tests are often the ones with the most production incidents. Not because unit tests cause incidents — because the teams writing unit tests and calling it a day weren't testing the things that actually broke.&lt;/p&gt;

&lt;p&gt;Production bugs in distributed Go backends don't usually look like "function computed wrong value." They look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The context deadline didn't propagate into the background goroutine, so under load it leaked."&lt;/li&gt;
&lt;li&gt;"Two services agreed on the happy path, but the error-shape contract diverged six months ago, and now one returns &lt;code&gt;status.Code(codes.Unavailable)&lt;/code&gt; where the other expects &lt;code&gt;codes.ResourceExhausted&lt;/code&gt;."&lt;/li&gt;
&lt;li&gt;"The retry logic is race-y. With test-scale traffic it works; at 10x production it double-charges."&lt;/li&gt;
&lt;li&gt;"The database migration works on SQLite (our test DB) but not Postgres 15's stricter planner."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No unit test catches those. A different set of test shapes does.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Stop framing tests as "unit vs integration." That's a level-of-isolation axis, and it's the least interesting one. The axes that matter for production Go: deterministic behavior (controlled clocks, seeded randomness), concurrency correctness (race detector, stress tests), contract fidelity (shared schemas, real downstreams), and environment fidelity (real DBs, real networks). Design your test suite around those; coverage follows.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Wrong Taxonomy
&lt;/h2&gt;

&lt;p&gt;"Unit tests test one function. Integration tests test several. E2E tests test the whole system."&lt;/p&gt;

&lt;p&gt;That framing is a starting point for junior engineers. It stops being useful the moment you're debugging why your Go service silently dropped a message in production. The level of isolation isn't the interesting axis. What is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic vs non-deterministic behavior.&lt;/strong&gt; Do the same inputs produce the same outputs every time?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency correctness.&lt;/strong&gt; Do the race conditions stay caught?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contract fidelity.&lt;/strong&gt; Do your assumptions about downstreams match what they actually do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment fidelity.&lt;/strong&gt; Does your test environment reproduce the production runtime closely enough to catch real bugs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A test can be "unit" on the isolation axis but score on two or three of these. A test can be "integration" and miss all four.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deterministic Behavior: The One Thing Every Test Should Have
&lt;/h2&gt;

&lt;p&gt;If you can't run your test a thousand times and get the same result, you have a flaky test, and flaky tests are worse than no tests — they train the team to ignore failures.&lt;/p&gt;

&lt;p&gt;The three sources of non-determinism in Go test suites, in order of prevalence:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Time
&lt;/h3&gt;

&lt;p&gt;Any test that calls &lt;code&gt;time.Now()&lt;/code&gt;, &lt;code&gt;time.After()&lt;/code&gt;, &lt;code&gt;time.Sleep()&lt;/code&gt;, or depends on wall-clock intervals is a landmine. It works on the developer's laptop and fails in a slow CI runner where GC decided to kick in.&lt;/p&gt;

&lt;p&gt;Fix: inject a clock. A minimal clock interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Clock&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;
    &lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;realClock&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;realClock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;            &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;realClock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;realClock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, &lt;code&gt;realClock&lt;/code&gt;. In tests, a &lt;code&gt;FakeClock&lt;/code&gt; that advances manually. Libraries like &lt;code&gt;github.com/benbjohnson/clock&lt;/code&gt; give you this for free.&lt;/p&gt;

&lt;p&gt;Payoff: a test that verifies "retries happen every 500ms for 3 attempts" becomes deterministic — advance the fake clock 500ms, observe a retry, advance another 500ms, observe again. No sleeping in the test.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Randomness
&lt;/h3&gt;

&lt;p&gt;Anything that shuffles, samples, picks a random ID, or generates random test data needs a seeded random source. &lt;code&gt;math/rand.Intn&lt;/code&gt; with default source is a machine-global shared state; two tests running in parallel can interfere.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;))}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In tests, pass a known seed. In production, &lt;code&gt;rand.NewSource(time.Now().UnixNano())&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Concurrency ordering
&lt;/h3&gt;

&lt;p&gt;The nasty one. A test that creates goroutines and checks a result has to either (a) synchronize on a deterministic completion signal (a channel, a &lt;code&gt;WaitGroup&lt;/code&gt;) or (b) poll with a timeout — which is back to non-determinism.&lt;/p&gt;

&lt;p&gt;The best habit: design for deterministic completion. If you're testing "five goroutines should all complete and total the result," use &lt;code&gt;sync.WaitGroup.Wait()&lt;/code&gt; or close a channel. Don't sleep. Don't poll.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency Correctness: The Race Detector Is Not Optional
&lt;/h2&gt;

&lt;p&gt;Go ships with a race detector. Running &lt;code&gt;go test -race&lt;/code&gt; is one flag and it catches an entire category of bugs that will otherwise show up as "works on my machine." In my experience, any production Go service will, on first &lt;code&gt;-race&lt;/code&gt; run, surface at least one real data race that had been silently ignored.&lt;/p&gt;

&lt;p&gt;The race detector adds ~5-10x runtime overhead, so people skip it on every-save tests. Fine. Run it in CI. Run it on nightly integration tests. Run it on anything touching shared state. Some configurations I've seen work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every PR&lt;/strong&gt;: run unit tests with &lt;code&gt;-race&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nightly&lt;/strong&gt;: run full integration suite with &lt;code&gt;-race&lt;/code&gt; and a longer timeout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-release&lt;/strong&gt;: run stress tests with &lt;code&gt;-race&lt;/code&gt; against a production-sized dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cost of running with &lt;code&gt;-race&lt;/code&gt; is engineering discipline. The payoff is not debugging a data race at 2 AM.&lt;/p&gt;

&lt;p&gt;Beyond the race detector, &lt;strong&gt;stress tests&lt;/strong&gt; are undervalued. A test that runs your concurrent path 1,000 times with different goroutine interleavings catches bugs that a single-iteration test never will.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestConcurrentWorkers_Stress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Short&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Skip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"stress test"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"iter%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="c"&gt;// ... actual test body ...&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;t.Parallel()&lt;/code&gt; + 1,000 iterations + &lt;code&gt;-race&lt;/code&gt; finds race conditions that a single deterministic run happily misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contract Fidelity: The Bug Class Everyone Misses
&lt;/h2&gt;

&lt;p&gt;Say your service calls a downstream gRPC service for payments. You write a mock that returns a successful response. Your tests pass. The downstream team changes their error code vocabulary. Your service now misinterprets their new error. Production finds out first.&lt;/p&gt;

&lt;p&gt;Contract testing addresses this. Two approaches work in practice:&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared schema, shared types
&lt;/h3&gt;

&lt;p&gt;If the downstream service publishes a protobuf file (they should), your service imports it directly. Your tests use types generated from the real contract. If the downstream bumps the proto, your next build fails — loudly, at compile time.&lt;/p&gt;

&lt;p&gt;This is the simplest and often best answer for Go services with gRPC downstreams. The contract is literally the shared protobuf.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consumer-driven contract tests
&lt;/h3&gt;

&lt;p&gt;Each consumer writes tests that capture their expectations of the downstream. Those tests run against the real downstream (or a contract-test server like Pact). When the downstream changes, the contract tests catch it before the contract-as-written reality diverges.&lt;/p&gt;

&lt;p&gt;This helps for REST APIs where there's no single source of truth schema. It's more ceremony. For most gRPC Go services, shared protobufs cover it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "mock everything" antipattern
&lt;/h3&gt;

&lt;p&gt;If your test suite consists of mocks that return whatever your test needs, you're not testing integration. You're testing that your code calls your mocks correctly. That's a tautology. Real integration bugs live in the gap between your mock's behavior and the downstream's actual behavior.&lt;/p&gt;

&lt;p&gt;Have at least one test per integration point that hits the real downstream — either in a staging environment or via Testcontainers. Keep the mocks for fast feedback, but don't pretend they're the only tests you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment Fidelity: Use Real Infra Where It Matters
&lt;/h2&gt;

&lt;p&gt;The sharpest line in my test taxonomy is between "close to production runtime" and "not close."&lt;/p&gt;

&lt;p&gt;Things that matter and are worth running on real infrastructure in tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Databases.&lt;/strong&gt; SQLite is not Postgres is not MySQL. Query planner, isolation levels, and error shapes differ. Test with the DB you ship with.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message brokers.&lt;/strong&gt; Kafka's ordering and offset semantics cannot be faked well. Use a real Kafka (or Redpanda) in tests that exercise ordering or replay.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caches.&lt;/strong&gt; Redis has specific failover and eviction semantics. A fake in-memory map doesn't reproduce them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-sensitive downstream APIs.&lt;/strong&gt; Anything with rate limits or TTLs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Things that rarely matter and are fine with fakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Object storage.&lt;/strong&gt; A local file-system backend usually reproduces S3 well enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics / tracing exporters.&lt;/strong&gt; Tests don't need a real Prometheus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email / SMS.&lt;/strong&gt; A mock recording calls is plenty.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern: &lt;strong&gt;test with real infra for anything where semantic difference is possible&lt;/strong&gt;. Testcontainers (&lt;code&gt;github.com/testcontainers/testcontainers-go&lt;/code&gt;) makes this painless:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;setupPostgres&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RunContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;testcontainers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"postgres:15-alpine"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"testdb"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithUsername&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"testuser"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPassword&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"testpass"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;require&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NoError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cleanup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Terminate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"sslmode=disable"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;require&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NoError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dsn&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Slow? Yes — each container takes a few seconds to start. But you can run them once per test package with a &lt;code&gt;TestMain&lt;/code&gt;, and the bugs they catch are the ones most worth catching.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real Taxonomy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBGYXN0WyJSdW4gb24gZXZlcnkgc2F2ZSJdCiAgICAgICAgVDFbIkZhc3QgdGVzdHM8YnIvPnB1cmUgZnVuY3Rpb25zIMK3IGFsZ29yaXRobXMiXQogICAgZW5kCgogICAgc3ViZ3JhcGggUFJbIlJ1biBvbiBldmVyeSBQUiJdCiAgICAgICAgVDJbIkNvbmN1cnJlbmN5IHRlc3RzPGJyLz4tcmFjZSDCtyBzdHJlc3MiXQogICAgICAgIFQzWyJEZXRlcm1pbmlzdGljIGludGVncmF0aW9uPGJyLz5mYWtlIGNsb2NrIMK3IGZha2UgZG93bnN0cmVhbSJdCiAgICAgICAgVDRbIlJlYWwtaW5mcmEgaW50ZWdyYXRpb248YnIvPlRlc3Rjb250YWluZXJzIFBvc3RncmVzIC8gUmVkaXMgLyBLYWZrYSJdCiAgICAgICAgVDVbIkNvbnRyYWN0IHRlc3RzPGJyLz5zaGFyZWQgc2NoZW1hcyDCtyBwcm90byB2ZXJzaW9ucyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBOaWdodGx5WyJSdW4gb24gc2NoZWR1bGUiXQogICAgICAgIFQ2WyJTdHJlc3MgdGVzdHM8YnIvPjEwMDAtaXRlciAtcmFjZSJdCiAgICAgICAgVDdbIkVuZC10by1lbmQ8YnIvPnJlYWwgc2VydmljZXMgwrcgc3RhZ2luZyJdCiAgICBlbmQKCiAgICBGYXN0IC0tPiBQUiAtLT4gTmlnaHRseQoKICAgIGNsYXNzRGVmIGZhc3QgZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhCiAgICBjbGFzc0RlZiBwciBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIG5pZ2h0bHkgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzcyBGYXN0IGZhc3QKICAgIGNsYXNzIFBSIHByCiAgICBjbGFzcyBOaWdodGx5IG5pZ2h0bHk%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBGYXN0WyJSdW4gb24gZXZlcnkgc2F2ZSJdCiAgICAgICAgVDFbIkZhc3QgdGVzdHM8YnIvPnB1cmUgZnVuY3Rpb25zIMK3IGFsZ29yaXRobXMiXQogICAgZW5kCgogICAgc3ViZ3JhcGggUFJbIlJ1biBvbiBldmVyeSBQUiJdCiAgICAgICAgVDJbIkNvbmN1cnJlbmN5IHRlc3RzPGJyLz4tcmFjZSDCtyBzdHJlc3MiXQogICAgICAgIFQzWyJEZXRlcm1pbmlzdGljIGludGVncmF0aW9uPGJyLz5mYWtlIGNsb2NrIMK3IGZha2UgZG93bnN0cmVhbSJdCiAgICAgICAgVDRbIlJlYWwtaW5mcmEgaW50ZWdyYXRpb248YnIvPlRlc3Rjb250YWluZXJzIFBvc3RncmVzIC8gUmVkaXMgLyBLYWZrYSJdCiAgICAgICAgVDVbIkNvbnRyYWN0IHRlc3RzPGJyLz5zaGFyZWQgc2NoZW1hcyDCtyBwcm90byB2ZXJzaW9ucyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBOaWdodGx5WyJSdW4gb24gc2NoZWR1bGUiXQogICAgICAgIFQ2WyJTdHJlc3MgdGVzdHM8YnIvPjEwMDAtaXRlciAtcmFjZSJdCiAgICAgICAgVDdbIkVuZC10by1lbmQ8YnIvPnJlYWwgc2VydmljZXMgwrcgc3RhZ2luZyJdCiAgICBlbmQKCiAgICBGYXN0IC0tPiBQUiAtLT4gTmlnaHRseQoKICAgIGNsYXNzRGVmIGZhc3QgZmlsbDojZjBmZmY0LHN0cm9rZTojMmY4NTVhCiAgICBjbGFzc0RlZiBwciBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIG5pZ2h0bHkgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmCiAgICBjbGFzcyBGYXN0IGZhc3QKICAgIGNsYXNzIFBSIHByCiAgICBjbGFzcyBOaWdodGx5IG5pZ2h0bHk%3D" alt="T1[" width="1904" height="173"&gt;&lt;/a&gt;pure functions · algorithms"]"/&amp;gt;&lt;/p&gt;

&lt;p&gt;Here's the taxonomy I actually use when designing a test suite for a Go backend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast tests&lt;/strong&gt; (seconds for the whole file): pure functions, algorithms, small state machines. Run on every save.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency tests&lt;/strong&gt; (seconds to a minute): anything with goroutines. Run with &lt;code&gt;-race&lt;/code&gt;. Run in PR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic integration tests&lt;/strong&gt; (single-digit seconds per test): one module + fakes + fake clock. Fast enough to keep in the main test run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-infra integration tests&lt;/strong&gt; (seconds per test): one module + real DB / Kafka / Redis via Testcontainers. Run in PR, longer timeout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contract tests&lt;/strong&gt; (milliseconds): verify shared schemas with downstreams. Run on every schema change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stress tests&lt;/strong&gt; (minutes): high-iteration, high-concurrency, with &lt;code&gt;-race&lt;/code&gt;. Run nightly or on schedule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End-to-end tests&lt;/strong&gt; (minutes): real services, real network, against a staging environment. Run pre-release.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you'll notice: "unit" and "integration" don't appear as categories. That's on purpose. The level of isolation is implementation detail. The purpose of the test is the taxonomy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Small Habits That Pay Off
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;t.Cleanup&lt;/code&gt; over &lt;code&gt;defer&lt;/code&gt;.&lt;/strong&gt; Cleanups run in LIFO order, can be added anywhere in the test, and survive test panics better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer table-driven tests.&lt;/strong&gt; Twenty tests as rows in a slice beats twenty nearly-identical test functions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail tests with &lt;code&gt;t.Fatalf&lt;/code&gt;, not &lt;code&gt;t.Errorf&lt;/code&gt;, for setup failures.&lt;/strong&gt; A broken setup should abort; a broken assertion might allow the test to continue collecting more failures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Golden files for complex outputs.&lt;/strong&gt; If you're verifying a generated SQL query, a serialized event, or a JSON response, a golden file comparison is more readable than a long string literal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate &lt;code&gt;_test.go&lt;/code&gt; files for slow tests with a build tag.&lt;/strong&gt; &lt;code&gt;//go:build integration&lt;/code&gt; lets you run them explicitly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Shift That Changed My Testing
&lt;/h2&gt;

&lt;p&gt;Coverage numbers lie. The question is not "what percent of lines are executed by tests" — it's "what percent of the risky behaviors are covered by tests that will actually fail when those behaviors break."&lt;/p&gt;

&lt;p&gt;A codebase with 95% line coverage and zero race tests, zero real-DB tests, and mock-heavy integration tests is brittle. A codebase with 60% line coverage, &lt;code&gt;go test -race&lt;/code&gt; in CI, Testcontainers for the DB, and a stress test for every hot concurrent path is not.&lt;/p&gt;

&lt;p&gt;The single biggest shift I recommend: &lt;strong&gt;stop thinking about tests in terms of isolation level, and start thinking about them in terms of the production failure modes you're actually afraid of&lt;/strong&gt;. Map each failure mode to a test shape. If you don't have a test shape for a failure mode, you don't really have that failure mode covered — you just hope it doesn't happen.&lt;/p&gt;

&lt;p&gt;Production has opinions about what you hope.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-chan-context-structure-not-speed/" rel="noopener noreferrer"&gt;Go's Concurrency Is About Structure, Not Speed&lt;/a&gt; — the concurrency patterns that make production-shape Go possible.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-context-distributed-systems-production/" rel="noopener noreferrer"&gt;Go Context in Distributed Systems: What Actually Works in Production&lt;/a&gt; — the single most common test gap in Go services I review.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — a production failure mode that's hard to test unless you design the test for it.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>testing</category>
      <category>backendengineering</category>
    </item>
    <item>
      <title>Scale-Up vs Scale-Out: Why Every Language Wins Somewhere</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Sat, 18 Apr 2026 00:18:32 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/scale-up-vs-scale-out-why-every-language-wins-somewhere-3k6l</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/scale-up-vs-scale-out-why-every-language-wins-somewhere-3k6l</guid>
      <description>&lt;p&gt;I worked with a team that rewrote a critical service from Go to Rust because "performance." Six months later, the service was 30% faster, the team was miserable, and feature velocity had dropped to a crawl. Meanwhile the competitor team, still on Go, had shipped four new features.&lt;/p&gt;

&lt;p&gt;We did the postmortem eventually. The service handled maybe 2,000 requests per second on a 4-core machine. CPU utilization sat around 20%. Rust's extra speed bought us exactly nothing — the bottleneck was downstream database latency. What it cost us was every feature we didn't ship while writing unsafe, fighting the borrow checker, and nursing the team through the learning curve.&lt;/p&gt;

&lt;p&gt;That incident taught me the question I wish I'd learned earlier: &lt;strong&gt;what are you actually scaling, and does the language buy you the right kind of scale?&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Language benchmarks optimize for one axis: per-request performance. Real systems have multiple axes — throughput, latency, concurrency, developer velocity, operational complexity, memory efficiency. Rust, Go, Java, Python aren't competing to be "fastest." They're different answers to different bets about what you're going to scale. Pick by fit, not by leaderboard.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Two Kinds of Scale
&lt;/h2&gt;

&lt;p&gt;At the top level, two strategies dominate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scale-up&lt;/strong&gt;: make one machine do more. Vertical scaling. Faster CPUs, more RAM, specialized hardware, lower per-operation cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale-out&lt;/strong&gt;: add more machines. Horizontal scaling. Cheaper commodity hardware, more concurrency, lots of work running in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just infrastructure decisions. They're reflected in the language and ecosystem you pick. A language optimized for scale-up (Rust, C++) has different priorities than one optimized for scale-out (Go, Elixir) or one optimized for neither but for developer leverage (Python, Ruby).&lt;/p&gt;

&lt;p&gt;The big confusion comes from mixing axes. "Rust is faster than Go" is true on per-op microbenchmarks and irrelevant if your workload is I/O-bound service-to-service traffic. "Python is slow" is true in a compute-bound loop and irrelevant for a 500-QPS API that spends 95% of its time waiting on PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Each Language Actually Wins
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FcXVhZHJhbnRDaGFydAogICAgdGl0bGUgTGFuZ3VhZ2UgZml0IGJ5IHdoYXQgeW91J3JlIHNjYWxpbmcKICAgIHgtYXhpcyBTY2FsZS1vdXQgKG1hbnkgbWFjaGluZXMgLyBjaGVhcCBjb25jdXJyZW5jeSkgLS0-IFNjYWxlLXVwIChvbmUgbWFjaGluZSwgcHVzaGVkIGhhcmQpCiAgICB5LWF4aXMgUHJvdG90eXBlIHZlbG9jaXR5IC0tPiBQcm9kdWN0aW9uIHJpZ29yCiAgICBxdWFkcmFudC0xICJTY2FsZS11cCArIHJpZ29yPGJyLz4oUnVzdCDCtyBDKysgwrcgWmlnKSIKICAgIHF1YWRyYW50LTIgIlNjYWxlLW91dCArIHJpZ29yPGJyLz4oR28gwrcgSmF2YS9Lb3RsaW4pIgogICAgcXVhZHJhbnQtMyAiU2NhbGUtb3V0ICsgdmVsb2NpdHk8YnIvPihQeXRob24gwrcgUnVieSDCtyBOb2RlKSIKICAgIHF1YWRyYW50LTQgIlNjYWxlLXVwICsgdmVsb2NpdHk8YnIvPihuYXJyb3cgbmljaGUpIgogICAgUnVzdDogWzAuODUsIDAuODVdCiAgICAiQysrIjogWzAuOTIsIDAuODhdCiAgICBHbzogWzAuMjUsIDAuNzVdCiAgICAiSmF2YS9Lb3RsaW4iOiBbMC4zMCwgMC44MF0KICAgIFB5dGhvbjogWzAuMjUsIDAuMjVdCiAgICBSdWJ5OiBbMC4yNSwgMC4zMF0KICAgIE5vZGU6IFswLjMwLCAwLjM1XQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FcXVhZHJhbnRDaGFydAogICAgdGl0bGUgTGFuZ3VhZ2UgZml0IGJ5IHdoYXQgeW91J3JlIHNjYWxpbmcKICAgIHgtYXhpcyBTY2FsZS1vdXQgKG1hbnkgbWFjaGluZXMgLyBjaGVhcCBjb25jdXJyZW5jeSkgLS0-IFNjYWxlLXVwIChvbmUgbWFjaGluZSwgcHVzaGVkIGhhcmQpCiAgICB5LWF4aXMgUHJvdG90eXBlIHZlbG9jaXR5IC0tPiBQcm9kdWN0aW9uIHJpZ29yCiAgICBxdWFkcmFudC0xICJTY2FsZS11cCArIHJpZ29yPGJyLz4oUnVzdCDCtyBDKysgwrcgWmlnKSIKICAgIHF1YWRyYW50LTIgIlNjYWxlLW91dCArIHJpZ29yPGJyLz4oR28gwrcgSmF2YS9Lb3RsaW4pIgogICAgcXVhZHJhbnQtMyAiU2NhbGUtb3V0ICsgdmVsb2NpdHk8YnIvPihQeXRob24gwrcgUnVieSDCtyBOb2RlKSIKICAgIHF1YWRyYW50LTQgIlNjYWxlLXVwICsgdmVsb2NpdHk8YnIvPihuYXJyb3cgbmljaGUpIgogICAgUnVzdDogWzAuODUsIDAuODVdCiAgICAiQysrIjogWzAuOTIsIDAuODhdCiAgICBHbzogWzAuMjUsIDAuNzVdCiAgICAiSmF2YS9Lb3RsaW4iOiBbMC4zMCwgMC44MF0KICAgIFB5dGhvbjogWzAuMjUsIDAuMjVdCiAgICBSdWJ5OiBbMC4yNSwgMC4zMF0KICAgIE5vZGU6IFswLjMwLCAwLjM1XQ%3D%3D" alt="title Language fit by what you're scaling" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rough positioning — not a benchmark, a fit map. The language you pick should live near the kind of scaling your system actually demands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rust / C++ / Zig — Scale-up champions
&lt;/h3&gt;

&lt;p&gt;These languages dominate when &lt;strong&gt;per-machine throughput is the bottleneck&lt;/strong&gt; and you can afford the engineering cost. That's a narrower set of problems than Twitter would have you believe, but the problems that exist are real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-frequency trading engines — microseconds matter, GC pauses are unacceptable, every cache line counts.&lt;/li&gt;
&lt;li&gt;Inference engines — llm.cpp, vllm, mistral.rs. Memory layout, SIMD, custom kernels.&lt;/li&gt;
&lt;li&gt;Databases and storage engines — ScyllaDB, TiKV, Foundation internals. State machines that live forever and must not leak.&lt;/li&gt;
&lt;li&gt;Network data planes — Cloudflare's Pingora, proxies at the edge.&lt;/li&gt;
&lt;li&gt;Game engines, audio/video encoding, embedded.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern: &lt;strong&gt;one box, pushed hard, for years&lt;/strong&gt;. Memory safety matters because bugs compound over time. Performance matters because throughput per core is the product.&lt;/p&gt;

&lt;p&gt;The cost: every commit is slower. Refactoring is expensive. Onboarding is measured in months, not weeks. The compile times are what they are. You pay this cost every day the service exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Go — Scale-out champion
&lt;/h3&gt;

&lt;p&gt;Go hits a specific sweet spot: &lt;strong&gt;cheap concurrency, predictable performance, fast-to-ship code, and easy to hire for&lt;/strong&gt;. It's a scale-out language.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thousands of goroutines per core, 2KB stacks, user-space context switching. The "cost of one more waiter" is nearly zero.&lt;/li&gt;
&lt;li&gt;Standard library is enough for 80% of backend work — HTTP server, JSON, SQL, crypto.&lt;/li&gt;
&lt;li&gt;Compilation is fast enough to stay in flow. Iteration loop feels similar to a dynamic language.&lt;/li&gt;
&lt;li&gt;Minimalism is aggressive. One person can read the whole language in a weekend. New hires are productive in days.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it loses: per-op performance. Go's GC is fine but not invisible. Zero-copy generic code is harder to write than in Rust. The type system doesn't prevent the entire class of bugs Rust's does.&lt;/p&gt;

&lt;p&gt;Go's bet: the problem you're most likely to have is "I need to handle 10x the concurrent work with 2x the code." Not "I need this loop to be 5% faster." For most backend services, that bet is right.&lt;/p&gt;

&lt;h3&gt;
  
  
  Java / Kotlin — Mature scale-out with runtime depth
&lt;/h3&gt;

&lt;p&gt;The JVM is what you want when the workload is scale-out but you need runtime flexibility Go doesn't give you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A mature JIT that optimizes hot paths beyond what AOT can.&lt;/li&gt;
&lt;li&gt;Rich profiling and monitoring (JFR, async-profiler, flight recorder) that makes post-deploy tuning feasible.&lt;/li&gt;
&lt;li&gt;A library ecosystem that, after 25 years, has a mature library for basically anything.&lt;/li&gt;
&lt;li&gt;Kotlin on top gives you modern syntax and coroutines without leaving the ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it loses: startup time, memory overhead, operational complexity (GC tuning is a real job), the occasional "it works on my JDK 11 but the prod JDK 17 changed something." Also: hiring is harder than Go now, at least in my corner of the industry.&lt;/p&gt;

&lt;p&gt;Java's bet: "you'll still be running this service in ten years, and you want to be able to tune its runtime when that day comes." For large enterprises with deep infrastructure, that bet pays off. For a startup shipping its first three services, the overhead is not worth it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python / Ruby — Developer-velocity champions
&lt;/h3&gt;

&lt;p&gt;The forgotten-but-dominant answer: languages that optimize neither scale-up nor scale-out, but &lt;strong&gt;scale-the-team&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast to write, fast to read, fast to debug.&lt;/li&gt;
&lt;li&gt;Massive libraries for data, ML, scripting, DSLs.&lt;/li&gt;
&lt;li&gt;Easy to onboard anyone — CS students, data scientists, analysts.&lt;/li&gt;
&lt;li&gt;Prototype-to-production path is shorter than anywhere else.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where they lose: per-core throughput, concurrency (the GIL is real), memory. Python and Ruby are not your language for a 100K QPS service.&lt;/p&gt;

&lt;p&gt;But a lot of real companies don't need a 100K QPS service. They need to get a thing working, put it in front of users, and iterate. If your current problem is "we need to ship the next feature this week," Python might be the right answer even if a Rust version would technically run faster.&lt;/p&gt;

&lt;p&gt;Python's bet: throughput isn't the constraint yet. Time-to-shipped-feature is. For most companies most of the time, that's correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Axes Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Beyond scale-up/scale-out, a few axes decide more projects than raw performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer-velocity per week
&lt;/h3&gt;

&lt;p&gt;"I can ship a feature and have it in production by Friday" beats "this service is 2x faster" most of the time. Measure it. If your current stack requires a two-day ceremony to deploy a one-line change, throughput is not your problem. Velocity is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational complexity
&lt;/h3&gt;

&lt;p&gt;Scale-up is operationally cheaper than scale-out. One machine, one process, one log. Scale-out gives you better redundancy but also distributed-systems problems — consistency, ordering, partial failure, chaos engineering. If your team is three people, the operational complexity of a 20-node scale-out cluster may eat more time than the language choice saves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory efficiency per dollar
&lt;/h3&gt;

&lt;p&gt;At cloud scale, memory is expensive. A Rust service that fits in 2GB where a Java service needs 8GB is a 4x savings on every instance. Multiply by thousands of instances and "per-op performance" stops being the interesting number — per-GB cost starts to matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hiring pool
&lt;/h3&gt;

&lt;p&gt;The language with the deepest talent pool in your market is usually the right answer for a new system, all else equal. A marginal technical improvement isn't worth a six-month hiring pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Learning curve shape
&lt;/h3&gt;

&lt;p&gt;Some languages have shallow onboarding (Go, Python) and a long tail of depth. Others have steep onboarding (Rust, Haskell) and you're productive only after the ramp. For a senior team on a long-lived system, steep is fine. For a fast-moving team, steep is expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern I See Repeated
&lt;/h2&gt;

&lt;p&gt;A company starts small, picks Python or Ruby, builds the thing, ships to production. Ten employees. One codebase. Life is fast.&lt;/p&gt;

&lt;p&gt;They grow to fifty engineers. The monolith cracks. Some services get rewritten in Go for concurrency and operational simplicity. A few performance-critical ones get written in Rust. Data infra sits on the JVM (Kafka, Spark, Flink). A few internal tools stay in Python because the team knows it and it works.&lt;/p&gt;

&lt;p&gt;Five years in, the stack is polyglot. Nobody regrets it. What they regret is the six months they spent trying to make a single-language stack work past its comfort zone — the Python team pushing for "just async more things," or the Rust team fighting the borrow checker on code that could have been Go, or the Java team explaining to a new hire why the stack trace is 400 lines long.&lt;/p&gt;

&lt;p&gt;The pattern: &lt;strong&gt;pick the language that fits the service, not the service that fits the language&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Ask the Question Now
&lt;/h2&gt;

&lt;p&gt;When someone proposes "let's build this new thing in X," I ask:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What's the expected traffic profile, and what's the per-request work shape?&lt;/li&gt;
&lt;li&gt;Is this scale-up limited (per-machine throughput) or scale-out limited (concurrent work)?&lt;/li&gt;
&lt;li&gt;Who's going to write this, and how fast do we need them productive?&lt;/li&gt;
&lt;li&gt;Who's going to operate this, and what's their tooling comfort?&lt;/li&gt;
&lt;li&gt;Does this interact with an existing ecosystem (JVM data platform, Rust security infra)?&lt;/li&gt;
&lt;li&gt;How long does it have to live?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The answer to those five questions usually lands me on one of three languages for 80% of systems I see: Go, Rust, or (for data-adjacent work) Kotlin on the JVM. Python still shows up for tools and glue. Everything else is contextual.&lt;/p&gt;

&lt;p&gt;The benchmarks don't help. Per-op microbenchmarks answer questions nobody is actually asking. The right question is which axes matter for &lt;em&gt;this&lt;/em&gt; system, and which language's bet lines up with those axes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Argument I've Stopped Having
&lt;/h2&gt;

&lt;p&gt;I still see engineers argue about whether Rust or Go is "better." Both are good languages. Both are bad choices for problems they weren't designed for. The meaningful question is which kind of scale you're paying for — and the honest answer is almost always a mix, evolving over time.&lt;/p&gt;

&lt;p&gt;The Rust rewrite I opened with wasn't a bad decision because Rust is a bad language. It was a bad decision because we weren't scale-up limited. We were downstream-database limited. No language could help with that.&lt;/p&gt;

&lt;p&gt;Know which scale you're buying, and buy it on purpose.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections: User-Space Context Switching, Explained&lt;/a&gt; — the design decision behind Go's scale-out bet.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-chan-context-structure-not-speed/" rel="noopener noreferrer"&gt;Go's Concurrency Is About Structure, Not Speed&lt;/a&gt; — what you actually get with Go, and what you don't.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/nats-kafka-mqtt-same-category-different-jobs/" rel="noopener noreferrer"&gt;NATS vs Kafka vs MQTT: Same Category, Very Different Jobs&lt;/a&gt; — applying the same fit-vs-benchmark thinking to messaging.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programminglanguages</category>
      <category>systemdesign</category>
      <category>scale</category>
      <category>rust</category>
    </item>
    <item>
      <title>From Locks to Actors: The Four Pillars of Modern Concurrency</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Fri, 17 Apr 2026 05:50:27 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/from-locks-to-actors-the-four-pillars-of-modern-concurrency-3o50</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/from-locks-to-actors-the-four-pillars-of-modern-concurrency-3o50</guid>
      <description>&lt;p&gt;Most working engineers have spent ninety percent of their concurrent-programming life in one model: shared memory protected by locks. Threads that all see the same variables. Mutexes around the critical sections. Hope and care. It's the model every OS textbook teaches, every mainstream language supports, and every senior engineer has a horror story about.&lt;/p&gt;

&lt;p&gt;It's also not the only option. Or even the best one, for many of the problems it gets used for. Three other models — CSP, actors, and software transactional memory — have been around for decades, mature enough for production, and each solves a class of problems that lock-based designs handle poorly.&lt;/p&gt;

&lt;p&gt;This is a map of all four, from a working backend engineer who uses each of them for different jobs, and a take on when each is the right answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBQMVsiMSDCtyBTaGFyZWQgTWVtb3J5ICsgTG9ja3MiXQogICAgICAgIE0xWyJUaHJlYWRzIHNoYXJlIGFkZHJlc3Mgc3BhY2UiXQogICAgICAgIE0yWyJNdXRleCDCtyBhdG9taWNzIMK3IGNvbmQgdmFyIl0KICAgICAgICBNM1siRGVhZGxvY2tzIMK3IHJhY2VzIMK3IGludmlzaWJsZSBidWdzIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFAyWyIyIMK3IENTUCDigJQgQ29tbXVuaWNhdGluZyBTZXF1ZW50aWFsIFByb2Nlc3NlcyJdCiAgICAgICAgQzFbIkdvcm91dGluZXMgKyBjaGFubmVscyJdCiAgICAgICAgQzJbIk93bmVyc2hpcCBtb3ZlcyB3aXRoIG1lc3NhZ2UiXQogICAgICAgIEMzWyJCYWNrcHJlc3N1cmUgYnVpbHQtaW4iXQogICAgZW5kCgogICAgc3ViZ3JhcGggUDNbIjMgwrcgQWN0b3JzIl0KICAgICAgICBBMVsiTmFtZWQgZW50aXR5ICsgbWFpbGJveCJdCiAgICAgICAgQTJbIlByaXZhdGUgc3RhdGUgwrcgbm8gc2hhcmluZyJdCiAgICAgICAgQTNbIlN1cGVydmlzaW9uIMK3IGxldCBpdCBjcmFzaCJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBQNFsiNCDCtyBTb2Z0d2FyZSBUcmFuc2FjdGlvbmFsIE1lbW9yeSJdCiAgICAgICAgUzFbIk9wdGltaXN0aWMgdHJhbnNhY3Rpb25zIl0KICAgICAgICBTMlsiQ29tcG9zYWJsZSDCtyByZXRyeSBvbiBjb25mbGljdCJdCiAgICAgICAgUzNbIk5vIGxvY2tzLCBubyBkZWFkbG9ja3MiXQogICAgZW5kCgogICAgY2xhc3NEZWYgcGlsbGFyIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MixzdHJva2Utd2lkdGg6MnB4CiAgICBjbGFzcyBQMSxQMixQMyxQNCBwaWxsYXI%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBQMVsiMSDCtyBTaGFyZWQgTWVtb3J5ICsgTG9ja3MiXQogICAgICAgIE0xWyJUaHJlYWRzIHNoYXJlIGFkZHJlc3Mgc3BhY2UiXQogICAgICAgIE0yWyJNdXRleCDCtyBhdG9taWNzIMK3IGNvbmQgdmFyIl0KICAgICAgICBNM1siRGVhZGxvY2tzIMK3IHJhY2VzIMK3IGludmlzaWJsZSBidWdzIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFAyWyIyIMK3IENTUCDigJQgQ29tbXVuaWNhdGluZyBTZXF1ZW50aWFsIFByb2Nlc3NlcyJdCiAgICAgICAgQzFbIkdvcm91dGluZXMgKyBjaGFubmVscyJdCiAgICAgICAgQzJbIk93bmVyc2hpcCBtb3ZlcyB3aXRoIG1lc3NhZ2UiXQogICAgICAgIEMzWyJCYWNrcHJlc3N1cmUgYnVpbHQtaW4iXQogICAgZW5kCgogICAgc3ViZ3JhcGggUDNbIjMgwrcgQWN0b3JzIl0KICAgICAgICBBMVsiTmFtZWQgZW50aXR5ICsgbWFpbGJveCJdCiAgICAgICAgQTJbIlByaXZhdGUgc3RhdGUgwrcgbm8gc2hhcmluZyJdCiAgICAgICAgQTNbIlN1cGVydmlzaW9uIMK3IGxldCBpdCBjcmFzaCJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBQNFsiNCDCtyBTb2Z0d2FyZSBUcmFuc2FjdGlvbmFsIE1lbW9yeSJdCiAgICAgICAgUzFbIk9wdGltaXN0aWMgdHJhbnNhY3Rpb25zIl0KICAgICAgICBTMlsiQ29tcG9zYWJsZSDCtyByZXRyeSBvbiBjb25mbGljdCJdCiAgICAgICAgUzNbIk5vIGxvY2tzLCBubyBkZWFkbG9ja3MiXQogICAgZW5kCgogICAgY2xhc3NEZWYgcGlsbGFyIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MixzdHJva2Utd2lkdGg6MnB4CiAgICBjbGFzcyBQMSxQMixQMyxQNCBwaWxsYXI%3D" alt="M1[" width="953" height="754"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — Concurrency has four viable pillars: shared memory + locks (threads, mutexes), CSP (channels, Go), actors (mailboxes, Erlang), and STM (transactional memory, Clojure). None is universally better. Each solves a different problem and has a different failure mode. Senior designs often mix three of them in one system. Mutex-for-everything works until it doesn't — usually at exactly the scale you promised you'd never reach.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pillar 1: Shared Memory + Locks
&lt;/h2&gt;

&lt;p&gt;The default. Threads, mutexes, atomics, condition variables. Every mainstream language has them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;: multiple threads of execution share the same address space. They read and write the same data. Mutexes make sure only one thread touches a critical section at a time. Atomics do the same for single-word operations without a full lock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple shared counters and caches.&lt;/strong&gt; &lt;code&gt;atomic.AddInt64&lt;/code&gt;, &lt;code&gt;sync.Map&lt;/code&gt;, LRU caches. The right tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tight single-process coordination&lt;/strong&gt; where the code is small enough for one person to hold in their head.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance-critical paths&lt;/strong&gt; where the overhead of channel sends or actor dispatches is too much.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deadlocks.&lt;/strong&gt; Two threads acquire locks in opposite order. Happens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority inversion.&lt;/strong&gt; Low-priority thread holds the lock, high-priority thread waits, work piles up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock ordering bugs at scale.&lt;/strong&gt; When N components each take M locks, the reasoning gets exponential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-model weirdness.&lt;/strong&gt; What one thread writes, another may not immediately see. You start caring about happens-before, acquire/release semantics, and why &lt;code&gt;volatile&lt;/code&gt; in Java is not what you thought.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invisible races.&lt;/strong&gt; The worst kind. Tests pass; production fails weirdly twice a month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use mutexes for small, localized shared state. Once the shared state has three collaborators or more, or a nontrivial invariant across fields, reach for one of the other models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 2: CSP (Communicating Sequential Processes)
&lt;/h2&gt;

&lt;p&gt;Tony Hoare's 1978 paper, popularized by Occam and now Go. The model Rob Pike and Ken Thompson picked for Go's concurrency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;: processes don't share memory; they send messages on named &lt;strong&gt;channels&lt;/strong&gt;. Senders and receivers rendezvous on the channel. Ownership of data moves with the message. "Do not communicate by sharing memory; share memory by communicating."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pipelines.&lt;/strong&gt; Data flows through stages, each a goroutine, connected by channels. Clean to read.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fan-out / fan-in.&lt;/strong&gt; One producer, many workers, one aggregator. The channel topology &lt;em&gt;is&lt;/em&gt; the architecture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure.&lt;/strong&gt; A bounded channel blocks the producer when full. No extra flow control needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cancellation coordination.&lt;/strong&gt; &lt;code&gt;select&lt;/code&gt; with &lt;code&gt;&amp;lt;-ctx.Done()&lt;/code&gt; is a clean primitive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lifecycle control.&lt;/strong&gt; Closing a channel is a broadcast to every listener.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deadlocks remain possible.&lt;/strong&gt; Two goroutines each waiting on the other's channel. Cycles in the channel graph are lethal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory leaks via unclosed channels.&lt;/strong&gt; A goroutine blocked on a send that will never be received lives forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Awkward request/reply.&lt;/strong&gt; You end up passing a reply channel with each request, which works but feels verbose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order isn't free.&lt;/strong&gt; Channel ordering is only per-channel. If you fan out and fan in, the aggregation is unordered unless you sort.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use CSP for coordination-heavy designs. When the structure of "who's alive, who sends to whom, when do things stop" is the architecture, channels make that visible in the code.&lt;/p&gt;

&lt;p&gt;Go is the obvious exemplar, but CSP-style is also available in Rust (&lt;code&gt;crossbeam-channel&lt;/code&gt;, &lt;code&gt;tokio::sync::mpsc&lt;/code&gt;), Kotlin (coroutines with channels), Python (&lt;code&gt;asyncio.Queue&lt;/code&gt;), and C# (&lt;code&gt;System.Threading.Channels&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 3: Actors
&lt;/h2&gt;

&lt;p&gt;Carl Hewitt's 1973 paper. Made practical by Erlang (1986) and later Akka (Scala/Java). The model behind WhatsApp, a decade of telecom, and most fault-tolerant messaging infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;: an &lt;strong&gt;actor&lt;/strong&gt; is a named entity with private state and a mailbox. Other actors send messages to its address. Messages are processed one at a time from the mailbox. No shared memory. Parent actors supervise children; when a child crashes, the parent decides to restart, escalate, or ignore. Crashes are normal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fault isolation at scale.&lt;/strong&gt; One actor crashing is expected; it doesn't take down the system. Supervision hierarchies make "let it crash" a sensible engineering strategy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateful services.&lt;/strong&gt; Each actor holds its own state. Conceptually clean: no shared global state, no locks around it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location transparency.&lt;/strong&gt; An actor can live in the same process, another process, or another machine. The sender doesn't know. This is where actors shine in distributed systems — the model scales across the network boundary natively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Massive concurrency with stateful semantics.&lt;/strong&gt; Erlang routinely runs millions of actors per node. Each is cheap.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mailbox unboundedness.&lt;/strong&gt; If a producer sends faster than the actor can process, the mailbox grows without bound. Bounded mailboxes exist; use them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message-ordering assumptions break across the network.&lt;/strong&gt; Within one node, delivery order is preserved per sender. Across nodes, all bets are off without explicit sequencing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing is harder.&lt;/strong&gt; Actors make their own state opaque; you test behavior through message exchange. Good frameworks help, but the habits needed are different from testing normal code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conceptual mismatch in CRUD-style backends.&lt;/strong&gt; If your business logic is "select some rows, transform them, insert result," actors are overkill. They shine on long-lived stateful entities (a game character, a connected device, a user session), not on stateless request handlers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Erlang and Elixir are the canonical runtimes. Akka brings actors to the JVM. Pony is a rare actor-first typed language. In Go, you can simulate actors with a goroutine + channel-as-mailbox pattern, but you lose Erlang's supervision and "let it crash" semantics unless you build them yourself.&lt;/p&gt;

&lt;p&gt;Use actors when you have &lt;strong&gt;long-lived stateful entities with fault requirements&lt;/strong&gt;. Telecom, messaging, multiplayer game servers, IoT device shadows, any system where "this particular entity has its own state machine, and we really care when it crashes" is the shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 4: Software Transactional Memory (STM)
&lt;/h2&gt;

&lt;p&gt;Imagine database transactions, but for in-memory data. That's STM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;: critical sections are wrapped in transactions. The runtime tracks reads and writes optimistically. On commit, if any data touched was modified by another transaction, the current one rolls back and retries. No explicit locks. Composability — two transactions can be combined into a larger one without redesigning the locking order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it shines&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Composable concurrent code.&lt;/strong&gt; Combining operations that were individually correct usually stays correct under STM. Lock-based code famously does not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-mostly workloads.&lt;/strong&gt; STM with multi-version concurrency control scales reads without blocking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding the lock-ordering bug class.&lt;/strong&gt; No locks, no deadlocks. The failure mode is retry storms, which are easier to reason about.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I/O inside transactions is awful.&lt;/strong&gt; Transactions may retry. If you did I/O, you may have done it multiple times. Either separate I/O from transactional state, or the runtime has to forbid I/O inside transactions (Haskell's STM monad does this at the type level).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry storms under contention.&lt;/strong&gt; Heavy write contention on the same data means constant retries. In the worst case, throughput can be worse than locks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited language support.&lt;/strong&gt; Clojure (built-in), Haskell (&lt;code&gt;STM&lt;/code&gt;), Scala (&lt;code&gt;scala-stm&lt;/code&gt;), Rust (experimental &lt;code&gt;stm&lt;/code&gt; crates). Not a mainstream feature of Go/Java/C#.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clojure is the canonical "STM as a first-class citizen" language — its refs and transactions are idiomatic. Haskell's &lt;code&gt;STM&lt;/code&gt; monad is arguably the cleanest realization. In other ecosystems, STM exists as libraries but hasn't displaced mutexes.&lt;/p&gt;

&lt;p&gt;Use STM when the concurrent state is small-to-medium, the access pattern is read-heavy with occasional writes, and you want the composability. For the rare problems that fit, STM is strictly simpler to reason about than locks. For problems that don't fit (I/O-heavy, write-contention-heavy), STM is worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Real Systems Mix Them
&lt;/h2&gt;

&lt;p&gt;The surprise for engineers who've only used one model: &lt;strong&gt;mature systems mix three of them in one codebase&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A typical backend service I'd build today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mutexes / atomics&lt;/strong&gt; for the inner loops — counters, caches, rate-limiter state, anything performance-critical with one clear owner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Channels (CSP)&lt;/strong&gt; for coordination — worker pools, pipelines, cancellation, shutdown signaling, bounded queues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actors (in a sense)&lt;/strong&gt; for long-lived stateful entities — each connected client session, each in-flight request, each background job. In Go I'd model this as "one goroutine per entity, communicating via channels," which isn't formal actors but inherits the useful semantics: isolated state, message-passing, crash-isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And I wouldn't use STM in that stack. Not because it's bad, but because the language runtime doesn't make it first-class. If I were writing Clojure, STM would be a natural fit for the in-memory state machines that would otherwise be locked maps.&lt;/p&gt;

&lt;p&gt;The old "pick one concurrency model" debate was always a false choice. The real decision is per-problem: what shape is the concurrent work, what's the state-sharing pattern, and what failure semantics do I want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Guide
&lt;/h2&gt;

&lt;p&gt;Quick map:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I have a counter that multiple goroutines read and update.&lt;/strong&gt; → atomic or mutex.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have a pipeline of work that flows through stages.&lt;/strong&gt; → channels (CSP).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have a fleet of long-lived sessions, each with its own state and lifetime.&lt;/strong&gt; → actor pattern (goroutine + mailbox channel, or real actor framework).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have a fleet of connected devices each with a state machine that must survive crashes.&lt;/strong&gt; → actor framework with supervision (Erlang, Akka, or Go with explicit crash/restart logic).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have complex shared state with nontrivial invariants across fields, and updates are occasional but important to compose.&lt;/strong&gt; → STM if your language supports it; otherwise, lots of careful mutex discipline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have a request/response flow with fan-out to downstreams.&lt;/strong&gt; → CSP with &lt;code&gt;errgroup.WithContext&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have no idea what I have.&lt;/strong&gt; → Start with mutexes, switch when it hurts. Don't over-engineer the first version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;Most people who get bitten by concurrency bugs got bitten because they used the wrong model, not because they used it wrong. A mutex-heavy design for a workload that's really a pipeline is fragile. A channels-for-everything design when there's a shared counter underneath ends up with awkward rendezvous. An actors-everywhere design when the business is CRUD requests reads like over-engineering.&lt;/p&gt;

&lt;p&gt;The four pillars aren't competing theories of concurrency. They're four tools, each good at specific jobs. Senior engineers know all four and reach for the right one. Junior engineers reach for the only one they know and force-fit it.&lt;/p&gt;

&lt;p&gt;If your career so far has been mostly mutexes, spend a weekend reading the other three. Write a toy pipeline in Go channels. Read Erlang's supervision documentation. Play with Clojure refs. The investment pays back every time you sit in a design review and someone proposes locking their way out of a structural problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-chan-context-structure-not-speed/" rel="noopener noreferrer"&gt;Go's Concurrency Is About Structure, Not Speed&lt;/a&gt; — CSP applied concretely in Go.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections&lt;/a&gt; — the runtime characteristics that make CSP cheap in Go.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/scale-up-scale-out-every-language-wins-somewhere/" rel="noopener noreferrer"&gt;Scale-Up vs Scale-Out: Why Every Language Wins Somewhere&lt;/a&gt; — the language-level view of the same question.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>concurrency</category>
      <category>systemdesign</category>
      <category>go</category>
      <category>erlang</category>
    </item>
    <item>
      <title>RPC vs NATS: It's Not About Sync vs Async — It's About Who Owns Completion</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Fri, 17 Apr 2026 05:50:26 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/rpc-vs-nats-its-not-about-sync-vs-async-its-about-who-owns-completion-1fi5</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/rpc-vs-nats-its-not-about-sync-vs-async-its-about-who-owns-completion-1fi5</guid>
      <description>&lt;p&gt;A team I worked with once migrated an order-placement path from gRPC to NATS because "it's decoupled and faster." The old flow was simple: the web service called &lt;code&gt;PlaceOrder&lt;/code&gt; via gRPC, got back an order ID, rendered success to the user. The new flow: web service publishes &lt;code&gt;order.place&lt;/code&gt; to NATS, an order-service consumes it and processes asynchronously.&lt;/p&gt;

&lt;p&gt;Within three weeks they had three kinds of incidents on rotation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate orders&lt;/strong&gt; — retry on the publisher side meant the same order was placed twice when the first publish actually succeeded but the ack was slow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lost orders&lt;/strong&gt; — consumer crashed mid-process; no ack meant NATS redelivered, but the consumer had already partially committed state, so redelivery was rejected by a dedup check. The order just... disappeared from the user's perspective.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dark-failure support tickets&lt;/strong&gt; — users reported "I clicked buy and nothing happened." From the publisher side, everything looked fine. From the consumer side, processing time had drifted from 50 ms to 45 seconds because a downstream DB had a slow query, and the web team had no telemetry on the consumer side.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The retro landed on a single sentence: &lt;em&gt;we thought we were changing the transport; we actually changed who owned the completion of the work&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — RPC and pub/sub messaging look like two points on a sync-vs-async spectrum. They aren't. They're two fundamentally different &lt;strong&gt;ownership contracts&lt;/strong&gt;. In RPC, the caller owns knowing the work finished. In messaging, the receiver owns it. Swapping one for the other without inverting retry, idempotency, ack, and observability is how you turn a clean migration into a three-month incident parade.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Sync-vs-Async Trap
&lt;/h2&gt;

&lt;p&gt;The most common framing I see is this: RPC is synchronous, messaging is asynchronous, pick based on whether you need the answer immediately. That framing is almost useless in practice. It conflates two separate axes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis 1: Does the caller wait?&lt;/strong&gt; Sync vs async. This is a latency question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis 2: Who is responsible for knowing the work completed?&lt;/strong&gt; Caller or receiver. This is a contract question.&lt;/p&gt;

&lt;p&gt;You can have synchronous messaging (request-reply over NATS with a reply subject — caller waits, but transport is pub/sub). You can have asynchronous RPC (fire-and-forget gRPC — &lt;code&gt;stream.Send&lt;/code&gt; with no ack). What matters isn't how long the caller waits. It's who's on the hook if the work doesn't happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Clean Ownership Models
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBSUENbIlJQQyDigJQgY2FsbGVyIG93bnMgY29tcGxldGlvbiJdCiAgICAgICAgQzFbQ2xpZW50XSAtLT58IjEuIGNhbGwgwrcgd2FpdCJ8IFMxW1NlcnZlcl0KICAgICAgICBTMSAtLT58IjIuIGRpZCB0aGUgdGhpbmcgwrcgcmV0dXJuIHJlc3VsdDxici8-KG9yIHRpbWVvdXQg4oaSIGNsaWVudCBkZWNpZGVzKSJ8IEMxCiAgICBlbmQKCiAgICBzdWJncmFwaCBNc2dbIk1lc3NhZ2luZyDigJQgcmVjZWl2ZXIgb3ducyBjb21wbGV0aW9uIl0KICAgICAgICBQMVtQdWJsaXNoZXJdIC0tPnwiMS4gc2VuZCDCtyBmaXJlIGFuZCBmb3JnZXQifCBCMVtbTWVzc2FnZSBidXNdXQogICAgICAgIEIxIC0tPnwiMi4gZXZlbnR1YWxseSJ8IFIxW0NvbnN1bWVyXQogICAgICAgIFIxIC0tPnwiMy4gYWNrIChvciBOQUNLIMK3IHJlZGVsaXZlcikifCBCMQogICAgZW5kCgogICAgY2xhc3NEZWYgcnBjIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MgogICAgY2xhc3NEZWYgbXNnIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgUlBDIHJwYwogICAgY2xhc3MgTXNnIG1zZw%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBSUENbIlJQQyDigJQgY2FsbGVyIG93bnMgY29tcGxldGlvbiJdCiAgICAgICAgQzFbQ2xpZW50XSAtLT58IjEuIGNhbGwgwrcgd2FpdCJ8IFMxW1NlcnZlcl0KICAgICAgICBTMSAtLT58IjIuIGRpZCB0aGUgdGhpbmcgwrcgcmV0dXJuIHJlc3VsdDxici8-KG9yIHRpbWVvdXQg4oaSIGNsaWVudCBkZWNpZGVzKSJ8IEMxCiAgICBlbmQKCiAgICBzdWJncmFwaCBNc2dbIk1lc3NhZ2luZyDigJQgcmVjZWl2ZXIgb3ducyBjb21wbGV0aW9uIl0KICAgICAgICBQMVtQdWJsaXNoZXJdIC0tPnwiMS4gc2VuZCDCtyBmaXJlIGFuZCBmb3JnZXQifCBCMVtbTWVzc2FnZSBidXNdXQogICAgICAgIEIxIC0tPnwiMi4gZXZlbnR1YWxseSJ8IFIxW0NvbnN1bWVyXQogICAgICAgIFIxIC0tPnwiMy4gYWNrIChvciBOQUNLIMK3IHJlZGVsaXZlcikifCBCMQogICAgZW5kCgogICAgY2xhc3NEZWYgcnBjIGZpbGw6I2U4ZjRmOCxzdHJva2U6IzJjNTI4MgogICAgY2xhc3NEZWYgbXNnIGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgUlBDIHJwYwogICAgY2xhc3MgTXNnIG1zZw%3D%3D" alt="C1[Client] --&amp;gt;|" width="380" height="840"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two models. Opposite error semantics. Opposite retry semantics. Opposite observability alignment. Swapping one for the other changes every downstream engineering assumption.&lt;/p&gt;

&lt;h3&gt;
  
  
  RPC: caller owns completion
&lt;/h3&gt;

&lt;p&gt;When a client makes an RPC, they hold the socket open until a response comes back. That response is a statement by the server: &lt;em&gt;I did the thing, here's the result&lt;/em&gt;. If the call times out, the client assumes failure (possibly partial) and has to decide what to do about it.&lt;/p&gt;

&lt;p&gt;What this means operationally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retry is a caller decision.&lt;/strong&gt; The caller knows whether the work is idempotent, how important it is, and how much budget is left.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors propagate naturally.&lt;/strong&gt; A gRPC status code goes right back up the call chain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability aligns.&lt;/strong&gt; The caller's span includes the work's duration. If it's slow, the caller sees it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure is immediate.&lt;/strong&gt; Callers block on slow servers, limiting their own rate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why RPC feels "simple" — the ownership contract is tight. The downside: the caller's fate is coupled to the callee's fate. A slow server propagates slowness back to every caller.&lt;/p&gt;

&lt;h3&gt;
  
  
  Messaging: receiver owns completion
&lt;/h3&gt;

&lt;p&gt;When a publisher sends a message, the bus accepts it. The publisher's job is done. Whether the work happens — when, in what order, how many times, whether at all — is now somebody else's problem. Usually the consumer's.&lt;/p&gt;

&lt;p&gt;What this means operationally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retry is a consumer decision.&lt;/strong&gt; The bus may redeliver on no-ack; the consumer has to decide how to handle that (idempotency key, dedup table, upsert).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors are silent on the publisher side.&lt;/strong&gt; A failed consumer doesn't tell the publisher. A dead-letter queue or out-of-band alerting has to be built.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability splits in two.&lt;/strong&gt; Publisher metrics say "I sent it." Consumer metrics say "I processed it." The gap between those — lag — is its own story.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure is decoupled.&lt;/strong&gt; Publishers can happily overwhelm consumers, which means you need consumer-side rate limits or bounded queues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why pub/sub feels "flexible" — producers and consumers are independent. The downside: &lt;em&gt;nothing is automatic&lt;/em&gt;. Every property that RPC gave you for free (retry policy, error propagation, aligned observability, flow control) is now a thing you have to design and build.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Decision
&lt;/h2&gt;

&lt;p&gt;Once you see it as an ownership question, the decision becomes clearer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Does the caller need the answer to decide what happens next?&lt;/strong&gt; → RPC. Auth check. Balance read. Inventory reservation. Any synchronous business flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is the work a notification that something already happened?&lt;/strong&gt; → Messaging. "Order was placed." "User signed up." Downstream consumers that don't gate the primary flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Can the work tolerate delay and be retried independently?&lt;/strong&gt; → Messaging. Email send. Indexing. Analytics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is the work idempotent by construction, or can it be made so cheaply?&lt;/strong&gt; → Messaging works. If not, RPC's caller-owned retry is simpler to reason about.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can mix them. Most mature microservice stacks do. The mistake is picking messaging because "decoupled is better" without doing the consumer-side engineering that decoupling requires.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Has to Change in the Migration
&lt;/h2&gt;

&lt;p&gt;Here's the minimum checklist for every RPC → messaging migration. If any of these aren't in place, the old code was better.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Idempotency keys, enforced at the consumer
&lt;/h3&gt;

&lt;p&gt;Every message carries an operation ID. The consumer dedup-checks before committing. This is not optional. Without it, any redelivery (and there will be redeliveries) creates duplicate state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OpID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="c"&gt;// idempotent: we already did this&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;doTheWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="c"&gt;// message will be redelivered&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;markProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OpID&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;markProcessed&lt;/code&gt; call has to be in the same transaction as the actual work, or you have a race where the work commits but the dedup record doesn't. Then the next redelivery re-does it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Explicit ack semantics
&lt;/h3&gt;

&lt;p&gt;Know whether your bus is at-most-once (send and forget, messages can be lost), at-least-once (redelivery on no-ack, duplicates possible), or effectively-once (at-least-once plus receiver-side dedup). Most production systems run on at-least-once with dedup. NATS core is at-most-once by default; NATS JetStream is at-least-once. Kafka is at-least-once with offset-based replay. RabbitMQ is configurable — check both sides agree.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Dead-letter path
&lt;/h3&gt;

&lt;p&gt;Messages that fail repeatedly have to go somewhere other than "redelivered forever." A dead-letter queue (or topic, or subject) plus an alert when non-trivial traffic hits it. Without this, a poison message takes a consumer out of service.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Consumer-side observability
&lt;/h3&gt;

&lt;p&gt;At minimum: consumer lag (messages in flight), processing time per message, error rate, redelivery rate. The publisher's metrics tell you about the bus, not about the work. If you can't see "how fast is the consumer chewing through the queue right now," you're flying blind during the next incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Replay and reprocessing
&lt;/h3&gt;

&lt;p&gt;What happens when the consumer has a bug that corrupts data for a day, you fix the bug, and now you need to reprocess yesterday's messages? In RPC, you'd re-run the caller. In messaging, you need the ability to replay from an offset or from a backup. If the bus doesn't give you that (NATS core doesn't, JetStream does), you need a separate event log.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Specific Pattern I Like: The Request-Reply on a Bus
&lt;/h2&gt;

&lt;p&gt;One thing that confuses the discussion: you &lt;em&gt;can&lt;/em&gt; do synchronous-looking work on a message bus. NATS has request-reply built in (&lt;code&gt;nc.Request(subject, payload, timeout)&lt;/code&gt;), where the publisher gets a correlated reply on a temporary subject. This gives you the RPC ergonomics while using the messaging infrastructure.&lt;/p&gt;

&lt;p&gt;When is this useful?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you want the operational simplicity of RPC (caller waits, caller decides) but your service mesh is the message bus and adding a gRPC stack is overhead.&lt;/li&gt;
&lt;li&gt;When you want transparent failover — multiple consumers can listen, any can reply, and the bus handles the routing.&lt;/li&gt;
&lt;li&gt;When you want unified observability — both "notify" and "ask" flows go through the same substrate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Request-reply over NATS gives you back caller-owned completion semantics on messaging infrastructure. It's the "pick ownership model separately from transport" option. Many good designs use it.&lt;/p&gt;

&lt;p&gt;The one that doesn't work: request-reply where the reply is supposed to happen later, via a different message. At that point the caller has moved on, the completion is truly transferred, and you're back in consumer-owned territory. Don't pretend otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Framing I Use in Design Reviews
&lt;/h2&gt;

&lt;p&gt;When someone says "let's use NATS/Kafka/RabbitMQ for this," I ask exactly one question: &lt;em&gt;who is responsible if the work doesn't happen?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the answer is "the caller will notice and retry," they want RPC. If the answer is "the receiver will eventually catch up," they want messaging. If the answer is "I don't know," the design isn't ready.&lt;/p&gt;

&lt;p&gt;Everything else — transport, framing, protocol — is implementation. The ownership contract is the architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/nats-kafka-mqtt-same-category-different-jobs/" rel="noopener noreferrer"&gt;NATS vs Kafka vs MQTT: Same Category, Very Different Jobs&lt;/a&gt; — once you've decided messaging, how to pick among the three.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — retry and resilience on the RPC side of the boundary.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-context-distributed-systems-production/" rel="noopener noreferrer"&gt;Go Context in Distributed Systems: What Actually Works in Production&lt;/a&gt; — cancellation propagation in caller-owned flows.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>distributedsystems</category>
      <category>rpc</category>
      <category>nats</category>
      <category>grpc</category>
    </item>
    <item>
      <title>Go Context in Distributed Systems: What Actually Works in Production</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Thu, 16 Apr 2026 15:36:46 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/go-context-in-distributed-systems-what-actually-works-in-production-4d7p</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/go-context-in-distributed-systems-what-actually-works-in-production-4d7p</guid>
      <description>&lt;p&gt;The bug was alive for three weeks. On a normal day it cost nothing. On the day it activated, it nearly took the service down.&lt;/p&gt;

&lt;p&gt;The pattern was simple. An HTTP handler had to fetch data from three downstream gRPC services and merge the results. The team had done the disciplined thing: set a 5-second deadline on the request context, propagate it all the way through to the handler, use &lt;code&gt;errgroup&lt;/code&gt; for parallelism. Except — and you've probably seen this one — the fan-out looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// has a 5-second deadline&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt; &lt;span class="c"&gt;// ← here&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt; &lt;span class="c"&gt;// ← here&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt; &lt;span class="c"&gt;// ← here&lt;/span&gt;

    &lt;span class="c"&gt;// ... some sync wait ...&lt;/span&gt;
    &lt;span class="n"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every day for three weeks, the downstreams responded in 20 ms and everything worked. Then one of them — the slow path — got a planned capacity change that degraded it from 20 ms to 20 seconds. Not a crash. Just slow. And the HTTP handler's 5-second deadline did exactly what it promised: returned a timeout to the client.&lt;/p&gt;

&lt;p&gt;But the three goroutines kept running. They didn't get the memo.&lt;/p&gt;

&lt;p&gt;Within ninety seconds, &lt;strong&gt;goroutines climbed from 2,000 to 80,000&lt;/strong&gt;, connection pools drained, the GC started to choke on the churn, and the entire service had to be restarted twice before someone figured out that &lt;code&gt;context.Background()&lt;/code&gt; inside a handler-scoped goroutine isn't a stylistic choice — it's a goroutine leak with extra steps.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — &lt;code&gt;context.Context&lt;/code&gt; is not documentation. It is the runtime boundary between "this work still matters" and "this work should stop." Every time you launch a goroutine from inside a request-scoped context and fail to propagate the parent ctx, you are creating work that outlives its reason to exist. Under load, that's what brings a service down — not CPU, not memory, not the downstream. Goroutines that won't die.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Context Actually Is
&lt;/h2&gt;

&lt;p&gt;The single biggest mistake I see engineers make is treating &lt;code&gt;context.Context&lt;/code&gt; like an argument convention — "the standard library says I should pass one, so I pass one." That's the wrong mental model.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;context.Context&lt;/code&gt; is four things, in order of importance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A cancellation signal.&lt;/strong&gt; When the context is done (cancelled, deadline exceeded), every goroutine holding it is being asked to stop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A deadline.&lt;/strong&gt; How much wall-clock budget this work has before it's considered failed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An error cause.&lt;/strong&gt; Why the context ended (&lt;code&gt;context.Canceled&lt;/code&gt;, &lt;code&gt;context.DeadlineExceeded&lt;/code&gt;, or a custom reason via &lt;code&gt;context.Cause&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A narrow channel for request-scoped metadata.&lt;/strong&gt; Trace ID, deadline, auth principal. That's about it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notice what's not on the list: data transport, DI container, settings object, session store, cache. If you're using context to pass any of those, you've already lost.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Context is control flow, not data.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBIWyJIVFRQIGhhbmRsZXI8YnIvPmN0eCDCtyA1cyBkZWFkbGluZSJdIC0tPiBHMQogICAgSCAtLT4gRzIKICAgIEggLS0-IEczCiAgICBHMVsiZ29yb3V0aW5lIEE8YnIvPmNhbGxBKGdjdHggwrcgcmVxKSJdIC0tPiBEMVsoRG93bnN0cmVhbSBBKV0KICAgIEcyWyJnb3JvdXRpbmUgQjxici8-Y2FsbEIoZ2N0eCDCtyByZXEpIl0gLS0-IEQyWyhEb3duc3RyZWFtIEIpXQogICAgRzNbImdvcm91dGluZSBDPGJyLz5jYWxsQyhnY3R4IMK3IHJlcSkiXSAtLT4gRDNbKERvd25zdHJlYW0gQyldCgogICAgQ2FuY2Vse3siY3R4LkRvbmUoKSBmaXJlczxici8-dGltZW91dCwgY2xpZW50IGdvbmUsPGJyLz5vciBzaWJsaW5nIGVycm9yZWQifX0gLS4tPnxicm9hZGNhc3R8IEcxCiAgICBDYW5jZWwgLS4tPnxicm9hZGNhc3R8IEcyCiAgICBDYW5jZWwgLS4tPnxicm9hZGNhc3R8IEczCiAgICBIIC0uLT4gQ2FuY2VsCgogICAgY2xhc3NEZWYgaGFuZGxlciBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIHdvcmtlciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIGNhbmNlbCBmaWxsOiNmZWQ3ZDcsc3Ryb2tlOiNjNTMwMzAsc3Ryb2tlLWRhc2hhcnJheTo1IDUKICAgIGNsYXNzIEggaGFuZGxlcgogICAgY2xhc3MgRzEsRzIsRzMgd29ya2VyCiAgICBjbGFzcyBDYW5jZWwgY2FuY2Vs" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBIWyJIVFRQIGhhbmRsZXI8YnIvPmN0eCDCtyA1cyBkZWFkbGluZSJdIC0tPiBHMQogICAgSCAtLT4gRzIKICAgIEggLS0-IEczCiAgICBHMVsiZ29yb3V0aW5lIEE8YnIvPmNhbGxBKGdjdHggwrcgcmVxKSJdIC0tPiBEMVsoRG93bnN0cmVhbSBBKV0KICAgIEcyWyJnb3JvdXRpbmUgQjxici8-Y2FsbEIoZ2N0eCDCtyByZXEpIl0gLS0-IEQyWyhEb3duc3RyZWFtIEIpXQogICAgRzNbImdvcm91dGluZSBDPGJyLz5jYWxsQyhnY3R4IMK3IHJlcSkiXSAtLT4gRDNbKERvd25zdHJlYW0gQyldCgogICAgQ2FuY2Vse3siY3R4LkRvbmUoKSBmaXJlczxici8-dGltZW91dCwgY2xpZW50IGdvbmUsPGJyLz5vciBzaWJsaW5nIGVycm9yZWQifX0gLS4tPnxicm9hZGNhc3R8IEcxCiAgICBDYW5jZWwgLS4tPnxicm9hZGNhc3R8IEcyCiAgICBDYW5jZWwgLS4tPnxicm9hZGNhc3R8IEczCiAgICBIIC0uLT4gQ2FuY2VsCgogICAgY2xhc3NEZWYgaGFuZGxlciBmaWxsOiNlOGY0Zjgsc3Ryb2tlOiMyYzUyODIKICAgIGNsYXNzRGVmIHdvcmtlciBmaWxsOiNmMGZmZjQsc3Ryb2tlOiMyZjg1NWEKICAgIGNsYXNzRGVmIGNhbmNlbCBmaWxsOiNmZWQ3ZDcsc3Ryb2tlOiNjNTMwMzAsc3Ryb2tlLWRhc2hhcnJheTo1IDUKICAgIGNsYXNzIEggaGFuZGxlcgogICAgY2xhc3MgRzEsRzIsRzMgd29ya2VyCiAgICBjbGFzcyBDYW5jZWwgY2FuY2Vs" alt="H[" width="660" height="509"&gt;&lt;/a&gt;ctx · 5s deadline"] --&amp;gt; G1"/&amp;gt;&lt;/p&gt;

&lt;p&gt;When the parent ctx is cancelled, the signal propagates to every goroutine that inherited it. Every spawned call drops the work it was doing and returns. That's the &lt;em&gt;whole&lt;/em&gt; value of context — and the reason &lt;code&gt;context.Background()&lt;/code&gt; inside a spawned goroutine breaks everything: it severs this tree.&lt;/p&gt;

&lt;p&gt;Every correct use of context follows from this. The moment you treat it as something else — a way to pass a config value, a way to smuggle a feature flag, a way to avoid changing a function signature — you start breaking the cancellation semantics that make it useful at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Patterns That Work
&lt;/h2&gt;

&lt;p&gt;After enough production debugging, a small set of patterns covers 95% of cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Always propagate, never replace
&lt;/h3&gt;

&lt;p&gt;The outer context defines the lifetime of the work. Any goroutine spawned to do part of that work must inherit it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// ✗ Wrong: spawned work is unkillable&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;doWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt;

&lt;span class="c"&gt;// ✓ Right: spawned work dies with the parent&lt;/span&gt;
&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;doWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your linter isn't flagging &lt;code&gt;context.Background()&lt;/code&gt; or &lt;code&gt;context.TODO()&lt;/code&gt; inside functions that already have a &lt;code&gt;ctx&lt;/code&gt; in scope, fix your linter. &lt;code&gt;contextcheck&lt;/code&gt; in &lt;code&gt;golangci-lint&lt;/code&gt; catches most of these.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Fan out with errgroup.WithContext
&lt;/h3&gt;

&lt;p&gt;Raw goroutines + &lt;code&gt;sync.WaitGroup&lt;/code&gt; is the wrong primitive for fan-out calls to downstreams. Use &lt;code&gt;golang.org/x/sync/errgroup&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;fanOut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;errgroup&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;

    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Go&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two properties that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gctx&lt;/code&gt; inherits the parent's deadline and cancellation.&lt;/strong&gt; The spawned calls die when the caller gives up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The first error cancels the sibling calls.&lt;/strong&gt; If &lt;code&gt;callA&lt;/code&gt; fails fast, the in-flight &lt;code&gt;callB&lt;/code&gt; and &lt;code&gt;callC&lt;/code&gt; stop wasting work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both are invisible in the code. That's the point. You get the right behavior without having to think about it per-callsite.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cap the subtree with WithTimeout
&lt;/h3&gt;

&lt;p&gt;The parent gives you the outer boundary. Sometimes you want a tighter one for a specific piece of work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;callSlowly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;800&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// ← don't leak the timer&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things people get wrong here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting &lt;code&gt;defer cancel()&lt;/code&gt;&lt;/strong&gt; leaks the timer goroutine. It's small, but it adds up under load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using &lt;code&gt;WithTimeout&lt;/code&gt; where &lt;code&gt;WithDeadline&lt;/code&gt; makes more sense&lt;/strong&gt; — if your budget is "finish by a fixed wall-clock time," use &lt;code&gt;WithDeadline&lt;/code&gt;. Timers and deadlines aren't the same.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stacking timeouts that exceed the parent.&lt;/strong&gt; A &lt;code&gt;WithTimeout(ctx, 30*time.Second)&lt;/code&gt; on a context that already has a 5-second deadline has a 5-second effective timeout. If you're setting 30 seconds, you probably meant to replace the parent, not extend it — which is almost never what you want. Check your assumptions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Make cancellation observable
&lt;/h3&gt;

&lt;p&gt;In a handler loop or polling loop, cancellation must be checked at every iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;work&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've debugged a service that looked like it was "stuck" but was actually processing a queue in a tight loop that never checked &lt;code&gt;ctx.Done()&lt;/code&gt;. The cancellation had fired long ago; the code just didn't care.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Return ctx.Err() at the right boundary
&lt;/h3&gt;

&lt;p&gt;When a context ends, the standard library returns &lt;code&gt;context.Canceled&lt;/code&gt; or &lt;code&gt;context.DeadlineExceeded&lt;/code&gt;. Your code needs to either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pass it up&lt;/strong&gt;, because the caller asked for cancellation and you're honoring it, or&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translate it&lt;/strong&gt;, because your API surface speaks a different error vocabulary (gRPC codes, HTTP status codes, domain errors).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;downstream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Was this our fault, or theirs?&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Is&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DeadlineExceeded&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DeadlineExceeded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"upstream deadline"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Is&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Canceled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Canceled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"caller cancelled"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't do this, the errors that reach your caller will be a mix of "the downstream is broken" and "you asked me to stop, remember?", and your on-call will waste hours separating the two.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;There are a handful of things that look fine and aren't. These are the ones I see most.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;context.Background()&lt;/code&gt; inside a spawned goroutine
&lt;/h3&gt;

&lt;p&gt;The bug that opens this post. You already have a context in scope. Use it. Spawning with &lt;code&gt;context.Background()&lt;/code&gt; breaks the cancellation chain and creates work that outlives the caller. It's the single most common goroutine leak I've seen in production Go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Passing the context by field instead of by argument
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// ✗ Wrong&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;callA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;// stale ctx&lt;/span&gt;

&lt;span class="c"&gt;// ✓ Right&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;callA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context is per-call, not per-object. The moment you stash it in a struct, you've made it stale — the context from construction time is not the context from the current call. &lt;code&gt;golangci-lint&lt;/code&gt; with the &lt;code&gt;contextcheck&lt;/code&gt; linter enabled catches most of these. If your CI doesn't run it, add it today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storing business data in context
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// ✗ Wrong&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"currentUser"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// ✓ Right&lt;/span&gt;
&lt;span class="n"&gt;ProcessOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule is: &lt;strong&gt;if the function needs it to work, it goes in the signature&lt;/strong&gt;. If it's optional metadata that cross-cuts every call (trace ID, request ID, auth principal for logging), context is fine — but keep the key typed (not a raw string) and keep the set small.&lt;/p&gt;

&lt;h3&gt;
  
  
  Blanket rethrow without translating
&lt;/h3&gt;

&lt;p&gt;Returning &lt;code&gt;ctx.Err()&lt;/code&gt; from a library function when the caller doesn't know about context produces baffling errors two layers up. If you're writing something reusable, translate context errors to your own error type at the boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Small Debugging Tool
&lt;/h2&gt;

&lt;p&gt;When you suspect a context-propagation problem, the fastest way to find it is usually a goroutine dump under load. Something like this keeps one around:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// /debug/goroutines — read-only, auth-gated in prod&lt;/span&gt;
&lt;span class="n"&gt;mux&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandleFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/debug/goroutines"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pprof&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"goroutine"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WriteTo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// 1 = text format with stacks&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ship it behind auth, point a cron or load test at the thing you're trying to exercise, and diff two snapshots 10 seconds apart. Goroutines that persist across snapshots and aren't in &lt;code&gt;netpoll&lt;/code&gt; or &lt;code&gt;runtime.park_m&lt;/code&gt; are your suspects. Nine times out of ten, when I follow the stack traces, the leaked goroutines were spawned from a handler that's already returned — because someone wrote &lt;code&gt;context.Background()&lt;/code&gt; inside a &lt;code&gt;go func()&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Leaves You
&lt;/h2&gt;

&lt;p&gt;The moment you treat &lt;code&gt;context.Context&lt;/code&gt; as decoration — as a parameter you pass because the lint rule told you to — you've already lost the benefit. The entire reason context exists is to be the one shared signal that ties the lifetime of spawned work to the lifetime of its cause. Ignore that and you get goroutine leaks. Honor it and you get a service that drains cleanly under partial failure.&lt;/p&gt;

&lt;p&gt;In a monolith, you can get away with sloppy cancellation because the damage stays local. In a distributed system, where one slow downstream can cascade through three layers of fan-out into a goroutine explosion, you cannot. The cost of sloppy context handling scales with the number of network hops, and modern architectures have many.&lt;/p&gt;

&lt;p&gt;The fix is boring. Use &lt;code&gt;errgroup.WithContext&lt;/code&gt; for fan-out. Never &lt;code&gt;context.Background()&lt;/code&gt; inside a handler-scoped goroutine. Translate context errors at API boundaries. Check &lt;code&gt;&amp;lt;-ctx.Done()&lt;/code&gt; in loops. Add a &lt;code&gt;/debug/goroutines&lt;/code&gt; endpoint and actually look at it.&lt;/p&gt;

&lt;p&gt;There are no clever moves here. There's only the habit of passing context correctly, every time, for years — and the services that outlast the ones that didn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections: User-Space Context Switching, Explained&lt;/a&gt; — the runtime-level counterpart: what makes spawning goroutines cheap in the first place.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — a different angle on the same class of problem: behaviour during partial failure.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>context</category>
      <category>concurrency</category>
    </item>
    <item>
      <title>Go's Concurrency Is About Structure, Not Speed: chan and context as Lifecycle Primitives</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Wed, 15 Apr 2026 19:24:53 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/gos-concurrency-is-about-structure-not-speed-chan-and-context-as-lifecycle-primitives-pdi</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/gos-concurrency-is-about-structure-not-speed-chan-and-context-as-lifecycle-primitives-pdi</guid>
      <description>&lt;p&gt;For a while, I thought channels were Go's way of doing message passing. Something like Erlang processes or actors, except with a simpler syntax. That understanding is fine if you're writing tutorials. It is not fine when you've just OOM-killed a pod for the third time in an hour because your worker pool wasn't really a pool.&lt;/p&gt;

&lt;p&gt;The moment it clicked for me was during a production incident. A Kafka consumer service had been humming along for months at about 1,000 messages per second. Then an upstream team replayed twelve hours of events into the topic at once — roughly &lt;strong&gt;1.2 million messages in two minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The consumer code looked like this, more or less:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;kafkaMessages&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// one goroutine per message, fire and forget&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what the runtime tried to do: spawn 1.2 million goroutines as fast as it could. It did. &lt;strong&gt;Heap climbed from 200 MB to 12 GB in about forty seconds.&lt;/strong&gt; GC pauses went from 2 ms to 800 ms. The pod got OOM-killed. Kubernetes restarted it. On restart, it re-read the uncommitted offsets. Repeat. It took forty minutes and manual producer-side rate limiting upstream before the system would stay up.&lt;/p&gt;

&lt;p&gt;The bug wasn't Kafka. It wasn't Go. It was the mental model — treating goroutines as "free" and treating channels as "a way to move data between them." Goroutines are not free under load. And channels are not pipes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — &lt;code&gt;chan&lt;/code&gt; and &lt;code&gt;context&lt;/code&gt; aren't just concurrency utilities. They're the two primitives Go gives you for drawing &lt;strong&gt;the boundaries of aliveness in your program&lt;/strong&gt;. &lt;code&gt;chan&lt;/code&gt; bounds how many things are alive at once (backpressure, ownership). &lt;code&gt;context&lt;/code&gt; bounds when they stop being alive (cancellation, deadline). Use them as the skeleton of your design, not as implementation details bolted on at the end.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Bounded Pool Fix
&lt;/h2&gt;

&lt;p&gt;The fix for the Kafka disaster is the classic bounded worker pool. The shape looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBLYWZrYVsoS2Fma2EgdG9waWMpXSAtLT4gUHJvZHVjZXJbIlByb2R1Y2VyIGdvcm91dGluZTxici8-cmVhZHMgb25lIGF0IGEgdGltZSJdCiAgICBQcm9kdWNlciAtLT58ImJvdW5kZWQgY2hhbm5lbDxici8-Y2FwYWNpdHkgTiJ8IEpvYnN7eyJqb2JzIGNoYW4ifX0KICAgIEpvYnMgLS0-IFcxWyJXb3JrZXIgMSJdCiAgICBKb2JzIC0tPiBXMlsiV29ya2VyIDIiXQogICAgSm9icyAtLT4gVzNbIldvcmtlciAuLi4iXQogICAgSm9icyAtLT4gV25bIldvcmtlciBNPGJyLz4oZml4ZWQgY291bnQpIl0KCiAgICBDdHhbKCJjdHguRG9uZSgpPGJyLz5icm9hZGNhc3QgY2FuY2VsIildIC0uLT4gUHJvZHVjZXIKICAgIEN0eCAtLi0-IFcxCiAgICBDdHggLS4tPiBXMgogICAgQ3R4IC0uLT4gVzMKICAgIEN0eCAtLi0-IFduCgogICAgY2xhc3NEZWYgY2xhbXAgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmLHN0cm9rZS13aWR0aDoycHgKICAgIGNsYXNzRGVmIHNpZ25hbCBmaWxsOiNmYWY1ZmYsc3Ryb2tlOiM2YjQ2YzEsc3Ryb2tlLWRhc2hhcnJheTo1IDUKICAgIGNsYXNzIEpvYnMgY2xhbXAKICAgIGNsYXNzIEN0eCBzaWduYWw%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBLYWZrYVsoS2Fma2EgdG9waWMpXSAtLT4gUHJvZHVjZXJbIlByb2R1Y2VyIGdvcm91dGluZTxici8-cmVhZHMgb25lIGF0IGEgdGltZSJdCiAgICBQcm9kdWNlciAtLT58ImJvdW5kZWQgY2hhbm5lbDxici8-Y2FwYWNpdHkgTiJ8IEpvYnN7eyJqb2JzIGNoYW4ifX0KICAgIEpvYnMgLS0-IFcxWyJXb3JrZXIgMSJdCiAgICBKb2JzIC0tPiBXMlsiV29ya2VyIDIiXQogICAgSm9icyAtLT4gVzNbIldvcmtlciAuLi4iXQogICAgSm9icyAtLT4gV25bIldvcmtlciBNPGJyLz4oZml4ZWQgY291bnQpIl0KCiAgICBDdHhbKCJjdHguRG9uZSgpPGJyLz5icm9hZGNhc3QgY2FuY2VsIildIC0uLT4gUHJvZHVjZXIKICAgIEN0eCAtLi0-IFcxCiAgICBDdHggLS4tPiBXMgogICAgQ3R4IC0uLT4gVzMKICAgIEN0eCAtLi0-IFduCgogICAgY2xhc3NEZWYgY2xhbXAgZmlsbDojZmVmNWU3LHN0cm9rZTojYjc3OTFmLHN0cm9rZS13aWR0aDoycHgKICAgIGNsYXNzRGVmIHNpZ25hbCBmaWxsOiNmYWY1ZmYsc3Ryb2tlOiM2YjQ2YzEsc3Ryb2tlLWRhc2hhcnJheTo1IDUKICAgIGNsYXNzIEpvYnMgY2xhbXAKICAgIGNsYXNzIEN0eCBzaWduYWw%3D" alt="Kafka[(Kafka topic)] --&amp;gt; Producer[" width="893" height="454"&gt;&lt;/a&gt;reads one at a time"]"/&amp;gt;&lt;/p&gt;

&lt;p&gt;The bounded channel is the concurrency clamp. The context is the kill switch. Neither alone is enough; together they give you a pipeline that drains cleanly under shutdown and refuses to explode under load.&lt;/p&gt;

&lt;p&gt;Here it is in full, because the code matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;workers&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;              &lt;span class="c"&gt;// fixed pool&lt;/span&gt;
        &lt;span class="n"&gt;bufferSize&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;             &lt;span class="c"&gt;// bounded queue&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bufferSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Spawn fixed workers&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Producer: push into bounded jobs, blocks when full&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// tell workers we're done&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="c"&gt;// enqueued; producer moves on&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// wait for shutdown&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="c"&gt;// producer closed the channel&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at what this code does that the broken version didn't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fixed number of workers.&lt;/strong&gt; 50, period. Never more, regardless of input rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded queue.&lt;/strong&gt; At most 100 in-flight messages between producer and workers. When the queue is full, the producer &lt;em&gt;stops reading from Kafka&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure is implicit.&lt;/strong&gt; The blocking send on &lt;code&gt;jobs &amp;lt;- msg&lt;/code&gt; is the backpressure mechanism. No complicated flow control needed. The channel is the mechanism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cancellation is wired everywhere.&lt;/strong&gt; &lt;code&gt;ctx.Done()&lt;/code&gt; in producer, worker, and consumer.ReadMessage. Any of them dies with the parent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. No semaphores. No rate limiters. No backoff. The channel semantics do the whole job. &lt;strong&gt;The channel isn't a pipe; it's a clamp.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Channels Are Lifecycle Primitives
&lt;/h2&gt;

&lt;p&gt;This is the insight I wish I'd had earlier: &lt;strong&gt;a channel isn't really about data transfer. It's about ownership and aliveness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you send on a channel, you're transferring ownership of a value from the sender to the receiver. The sender no longer owns it; the receiver does. That's useful, and it's the "share memory by communicating" idea you've probably read a dozen times.&lt;/p&gt;

&lt;p&gt;But the deeper use is backpressure. A channel with capacity N means "at most N things can be in flight between these two points in the program." When it's full, the producer has to stop. That stop is the entire backpressure signal — no separate rate limiter, no token bucket, no hand-rolled semaphore. The buffer size is the concurrency bound.&lt;/p&gt;

&lt;p&gt;Once you see this, you stop thinking of channels as "fancy queues" and start thinking of them as &lt;strong&gt;structural declarations&lt;/strong&gt;: &lt;em&gt;this is how many things can be happening in this zone of my program&lt;/em&gt;. That's a very different design tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Four patterns that fall out
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Bounded pool.&lt;/strong&gt; The Kafka example above. Fixed workers consume from a bounded channel. The channel is the clamp on in-flight work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fan-out, fan-in.&lt;/strong&gt; One producer, N workers, one aggregator. Each stage is a channel. The sizes of those channels are the concurrency limits between stages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  producer  →  [chan N] →  worker pool (M)  →  [chan N']  →  aggregator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rate-limited writer.&lt;/strong&gt; Want to batch writes to a slow downstream? One channel in, one goroutine that flushes every 100 items or every 500ms, whichever comes first. The channel is the queue; the goroutine is the flush policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graceful shutdown signal.&lt;/strong&gt; A &lt;code&gt;chan struct{}&lt;/code&gt; closed on shutdown is a broadcast to every goroutine listening. Every place that checks &lt;code&gt;case &amp;lt;-done:&lt;/code&gt; gets the signal at the same time, for free.&lt;/p&gt;

&lt;p&gt;None of these need mutexes. Mutexes show up when you have shared mutable state that multiple goroutines &lt;em&gt;read and modify together&lt;/em&gt; — a cache, a counter, a shared map. That's different from "multiple goroutines coordinating their lifecycles," which is what channels are for.&lt;/p&gt;

&lt;p&gt;The rule of thumb I use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coordinating goroutines?&lt;/strong&gt; Channels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sharing a counter or cache?&lt;/strong&gt; &lt;code&gt;sync.Mutex&lt;/code&gt;, &lt;code&gt;sync.RWMutex&lt;/code&gt;, or &lt;code&gt;sync/atomic&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Both?&lt;/strong&gt; Channels for the outer shape (who's alive, when to stop), mutex for the inner state (protected data).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Context Is the Other Half
&lt;/h2&gt;

&lt;p&gt;If &lt;code&gt;chan&lt;/code&gt; defines "how many alive," &lt;code&gt;context&lt;/code&gt; defines "when to die." You already know the story if you've written any Go: &lt;code&gt;context.Context&lt;/code&gt; carries a cancellation signal, an optional deadline, and a shallow bag of request-scoped metadata. It propagates down through function calls, and when it fires, every goroutine holding it is asked to stop.&lt;/p&gt;

&lt;p&gt;What I want to emphasize is the &lt;em&gt;pairing&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In the bounded-pool example above, look at where &lt;code&gt;ctx&lt;/code&gt; appears:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the worker's &lt;code&gt;select&lt;/code&gt; loop — so a worker can stop mid-wait on &lt;code&gt;&amp;lt;-jobs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In the producer's &lt;code&gt;select&lt;/code&gt; — so the producer can stop mid-wait on &lt;code&gt;jobs &amp;lt;- msg&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In the call to &lt;code&gt;consumer.ReadMessage(ctx)&lt;/code&gt; — so the Kafka read unblocks immediately on shutdown.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remove any one of those and the shutdown path has a hole. With all three, &lt;code&gt;cancel()&lt;/code&gt; on the parent context makes the entire pipeline drain and stop cleanly in under a second. The channel decides the structure; the context decides the termination.&lt;/p&gt;

&lt;p&gt;Neither primitive alone is enough. A bounded channel without cancellation will keep processing until its queue drains — which can be minutes for a deep queue. A cancelled context without a bounded channel still lets you create unbounded goroutines between now and the moment everyone notices. You need both.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;chan&lt;/code&gt; draws the boundaries in space. &lt;code&gt;context&lt;/code&gt; draws the boundary in time. Together they describe the shape and the lifetime of concurrent work.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What "Structure, Not Speed" Actually Means
&lt;/h2&gt;

&lt;p&gt;Go's concurrency model is often sold as fast. Sometimes it is. Per-request throughput in well-written Go is solidly middle of the pack — beaten by Rust and C++, comparable to Java and C#. You do not pick Go because it's fast.&lt;/p&gt;

&lt;p&gt;You pick Go because &lt;em&gt;the design of a concurrent program becomes tractable&lt;/em&gt;. A senior engineer reading a goroutine-and-channels design understands what's alive and what bounds it. A junior engineer reading the same code doesn't have to know about monitors, condition variables, or lock ordering. The shape of the program is visible in the channel declarations.&lt;/p&gt;

&lt;p&gt;That's the "structure" pitch. And it works because the primitives compose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bounded channels compose into pipelines with known concurrency at each stage.&lt;/li&gt;
&lt;li&gt;Contexts compose into a lifetime tree where cancelling any subtree stops everything below it.&lt;/li&gt;
&lt;li&gt;Select statements compose cancellation, timeouts, and channel operations into a single readable switch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The failure mode — the one that gave me the Kafka outage — is treating these primitives as optional utilities you reach for when a standard pattern doesn't fit. They aren't. They're the &lt;strong&gt;first-class design vocabulary&lt;/strong&gt; of concurrent Go. The moment you're writing concurrent code without thinking in channels and contexts, you've left the paved road.&lt;/p&gt;

&lt;h2&gt;
  
  
  Small Things That Matter
&lt;/h2&gt;

&lt;p&gt;A few tactical points I've learned the expensive way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always document channel ownership.&lt;/strong&gt; Who closes it? Who sends? Who receives? A closed channel panics on send. A nil channel blocks forever in select. These are easy to reason about if ownership is clear, and confusing if it isn't. I use comments right at the declaration site: &lt;code&gt;// jobs: producer sends and closes; workers receive&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Close from the sender, not the receiver.&lt;/strong&gt; There's exactly one sender and it owns the lifecycle. Multiple senders? Use a separate &lt;code&gt;done&lt;/code&gt; channel or a &lt;code&gt;sync.Once&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;select&lt;/code&gt; with &lt;code&gt;default&lt;/code&gt; is not backpressure, it's drop.&lt;/strong&gt; &lt;code&gt;select { case ch &amp;lt;- x: default: }&lt;/code&gt; drops the message if the channel is full. Sometimes that's what you want (metrics sampling). Often it's a bug disguised as a performance optimization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unbuffered channels are rendezvous, not pipes.&lt;/strong&gt; An unbuffered send completes the instant a receiver is ready, not before. This is sometimes exactly the synchronization you want (handoff semantics) and sometimes a deadlock waiting to happen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test under load, not just logic.&lt;/strong&gt; The Kafka incident would have been caught by any realistic load test. Unit tests happily ran the &lt;code&gt;go process(msg)&lt;/code&gt; version and passed. Load is what reveals structural bugs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;Three years ago I'd have written "use goroutines for parallelism, channels for communication, context for cancellation" and considered that advice. I don't think it's wrong, but it misses the point.&lt;/p&gt;

&lt;p&gt;The better framing is: &lt;strong&gt;chan and context are the two primitives for drawing boundaries around concurrent work.&lt;/strong&gt; One draws the boundary of "how many alive." The other draws "when to die." Everything else — the pools, the pipelines, the cancellation trees — is built by composing these two.&lt;/p&gt;

&lt;p&gt;A design that doesn't specify those boundaries isn't really a design. It's just code that happens to spawn goroutines. Sometimes it works. Sometimes it eats a pod's memory in forty seconds.&lt;/p&gt;

&lt;p&gt;The Kafka incident fixed itself the day we stopped writing &lt;code&gt;go process(msg)&lt;/code&gt; and started writing &lt;code&gt;jobs &amp;lt;- msg&lt;/code&gt;. The second version is longer. It's also the version that doesn't page us at 3 AM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-millions-connections-user-space-context-switching/" rel="noopener noreferrer"&gt;Why Go Handles Millions of Connections: User-Space Context Switching, Explained&lt;/a&gt; — why spawning goroutines is cheap in the first place. The foundation that lets you do any of this.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/go-context-distributed-systems-production/" rel="noopener noreferrer"&gt;Go Context in Distributed Systems: What Actually Works in Production&lt;/a&gt; — the sibling post on context propagation patterns and the &lt;code&gt;context.Background()&lt;/code&gt; trap.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://harrisonsec.com/blog/fail-fast-bounded-resilience-distributed-systems/" rel="noopener noreferrer"&gt;Why Your "Fail-Fast" Strategy is Killing Your Distributed System&lt;/a&gt; — a different lens on the same underlying question: what should your program do when the world gets slow?&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>concurrency</category>
      <category>channels</category>
    </item>
    <item>
      <title>Why Go Handles Millions of Connections: User-Space Context Switching, Explained</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:43:03 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/why-go-handles-millions-of-connections-user-space-context-switching-explained-kf3</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/why-go-handles-millions-of-connections-user-space-context-switching-explained-kf3</guid>
      <description>&lt;p&gt;Somewhere around 40,000 concurrent connections, your Java service falls over. Not from CPU, not from network — from memory, because every connection is a thread and every thread wants its own megabyte of stack. By the time you've finished Googling whether this is a &lt;code&gt;-Xss&lt;/code&gt; problem or a &lt;code&gt;ulimit&lt;/code&gt; problem, Ops has already bumped the box to 64 GB and you've pushed the wall back another 20,000 connections. Linear in RAM. It never ends.&lt;/p&gt;

&lt;p&gt;A Go service on half that box can hold 200,000 connections without noticing. People assume it's because Go is faster. It isn't. Per-request, Go and Java are roughly the same — sometimes Java wins. What Go does differently is more fundamental: &lt;strong&gt;it stops asking the kernel to help.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; — High-concurrency isn't about raw CPU. It's about how cheaply you can hold an idle connection open. Go's 2KB goroutine stacks and user-space M:N scheduler push the marginal cost of a connection close to zero. The kernel only gets involved when there's real I/O to do. This is the same principle HFT engines chase with DPDK and io_uring — Go just hands it to you for free.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Wrong Mental Model
&lt;/h2&gt;

&lt;p&gt;Most engineers I talk to think "threads are expensive because threading is hard." That's not wrong, but it misses the more mechanical reason.&lt;/p&gt;

&lt;p&gt;Every time a traditional language (Java pre-Loom, C# pre-async everywhere, classic Python) parks a thread waiting for I/O, it pays two concrete costs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stack memory&lt;/strong&gt;: Default JVM thread stack is 1 MB. 40,000 threads = 40 GB of stack, most of which is unused.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context-switch cost&lt;/strong&gt;: When the OS swaps the thread, it traps into the kernel, saves the full register set, swaps page tables if there's an address-space change, flushes TLB entries, and walks the scheduler's runqueue. Measured on modern x86, that's &lt;strong&gt;1–5 microseconds per switch&lt;/strong&gt;, plus the less visible cost of instruction-cache pollution.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Multiply that by tens of thousands of waiters and you're paying the kernel a rent that has nothing to do with your actual workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Go Does Instead
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRCCiAgICBzdWJncmFwaCBKYXZhWyJKYXZhIMK3IG9uZSB0aHJlYWQgcGVyIGNvbm5lY3Rpb24iXQogICAgICAgIEpUMVsiVGhyZWFkIDE8YnIvPnN0YWNrIOKJiCAxIE1CIl0KICAgICAgICBKVDJbIlRocmVhZCAyPGJyLz5zdGFjayDiiYggMSBNQiJdCiAgICAgICAgSlQzWyJUaHJlYWQgLi4uPGJyLz5zdGFjayDiiYggMSBNQiJdCiAgICAgICAgSlQxIC0uLT58a2VybmVsIGNvbnRleHQgc3dpdGNoPGJyLz5UTEIgZmx1c2ggwrcgcmVnIHNhdmV8IEtlcm5lbDFbKEtlcm5lbCBzY2hlZHVsZXIpXQogICAgICAgIEpUMiAtLi0-IEtlcm5lbDEKICAgICAgICBKVDMgLS4tPiBLZXJuZWwxCiAgICBlbmQKCiAgICBzdWJncmFwaCBHb1siR28gwrcgZ29yb3V0aW5lcyBvbiBhIHNtYWxsIHBvb2wgb2YgT1MgdGhyZWFkcyJdCiAgICAgICAgRzFbIkdvcm91dGluZSAxPGJyLz5zdGFjayAyIEtCIl0KICAgICAgICBHMlsiR29yb3V0aW5lIDI8YnIvPnN0YWNrIDIgS0IiXQogICAgICAgIEczWyJHb3JvdXRpbmUgLi4uPGJyLz5zdGFjayAyIEtCIl0KICAgICAgICBHNFsiR29yb3V0aW5lIE48YnIvPnN0YWNrIDIgS0IiXQogICAgICAgIFJ1bnRpbWVbIkdvIHJ1bnRpbWUgc2NoZWR1bGVyPGJyLz5NOk4gwrcgdXNlciBzcGFjZSJdCiAgICAgICAgRzEgLS0-IFJ1bnRpbWUKICAgICAgICBHMiAtLT4gUnVudGltZQogICAgICAgIEczIC0tPiBSdW50aW1lCiAgICAgICAgRzQgLS0-IFJ1bnRpbWUKICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1RbIk9TIHRocmVhZCAxIl0KICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1QyWyJPUyB0aHJlYWQgLi4uIl0KICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1RuWyJPUyB0aHJlYWQgR09NQVhQUk9DUyJdCiAgICBlbmQKCiAgICBjbGFzc0RlZiBoZWF2eSBmaWxsOiNmZWQ3ZDcsc3Ryb2tlOiNjNTMwMzAKICAgIGNsYXNzRGVmIGxpZ2h0IGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgSmF2YSBoZWF2eQogICAgY2xhc3MgR28gbGlnaHQ%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRCCiAgICBzdWJncmFwaCBKYXZhWyJKYXZhIMK3IG9uZSB0aHJlYWQgcGVyIGNvbm5lY3Rpb24iXQogICAgICAgIEpUMVsiVGhyZWFkIDE8YnIvPnN0YWNrIOKJiCAxIE1CIl0KICAgICAgICBKVDJbIlRocmVhZCAyPGJyLz5zdGFjayDiiYggMSBNQiJdCiAgICAgICAgSlQzWyJUaHJlYWQgLi4uPGJyLz5zdGFjayDiiYggMSBNQiJdCiAgICAgICAgSlQxIC0uLT58a2VybmVsIGNvbnRleHQgc3dpdGNoPGJyLz5UTEIgZmx1c2ggwrcgcmVnIHNhdmV8IEtlcm5lbDFbKEtlcm5lbCBzY2hlZHVsZXIpXQogICAgICAgIEpUMiAtLi0-IEtlcm5lbDEKICAgICAgICBKVDMgLS4tPiBLZXJuZWwxCiAgICBlbmQKCiAgICBzdWJncmFwaCBHb1siR28gwrcgZ29yb3V0aW5lcyBvbiBhIHNtYWxsIHBvb2wgb2YgT1MgdGhyZWFkcyJdCiAgICAgICAgRzFbIkdvcm91dGluZSAxPGJyLz5zdGFjayAyIEtCIl0KICAgICAgICBHMlsiR29yb3V0aW5lIDI8YnIvPnN0YWNrIDIgS0IiXQogICAgICAgIEczWyJHb3JvdXRpbmUgLi4uPGJyLz5zdGFjayAyIEtCIl0KICAgICAgICBHNFsiR29yb3V0aW5lIE48YnIvPnN0YWNrIDIgS0IiXQogICAgICAgIFJ1bnRpbWVbIkdvIHJ1bnRpbWUgc2NoZWR1bGVyPGJyLz5NOk4gwrcgdXNlciBzcGFjZSJdCiAgICAgICAgRzEgLS0-IFJ1bnRpbWUKICAgICAgICBHMiAtLT4gUnVudGltZQogICAgICAgIEczIC0tPiBSdW50aW1lCiAgICAgICAgRzQgLS0-IFJ1bnRpbWUKICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1RbIk9TIHRocmVhZCAxIl0KICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1QyWyJPUyB0aHJlYWQgLi4uIl0KICAgICAgICBSdW50aW1lIC0tPnxydW5zIG9ufCBPU1RuWyJPUyB0aHJlYWQgR09NQVhQUk9DUyJdCiAgICBlbmQKCiAgICBjbGFzc0RlZiBoZWF2eSBmaWxsOiNmZWQ3ZDcsc3Ryb2tlOiNjNTMwMzAKICAgIGNsYXNzRGVmIGxpZ2h0IGZpbGw6I2YwZmZmNCxzdHJva2U6IzJmODU1YQogICAgY2xhc3MgSmF2YSBoZWF2eQogICAgY2xhc3MgR28gbGlnaHQ%3D" alt="JT1[" width="1545" height="548"&gt;&lt;/a&gt;stack ≈ 1 MB"]"/&amp;gt;&lt;/p&gt;

&lt;p&gt;Go's concurrency is built on an &lt;strong&gt;M:N scheduler&lt;/strong&gt;. You have many goroutines (N) multiplexed onto a small number of OS threads (M, typically &lt;code&gt;GOMAXPROCS&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Here's the part that matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A goroutine starts with a &lt;strong&gt;2 KB stack&lt;/strong&gt;, not a megabyte. Growth is copy-and-resize in user space, triggered by the function prologue when it detects a near-overflow.&lt;/li&gt;
&lt;li&gt;Switching between goroutines happens &lt;strong&gt;entirely in the Go runtime&lt;/strong&gt;. No syscall. No TLB flush. No register-set save-and-restore at OS cost. Roughly a couple hundred nanoseconds in microbenchmarks — an order of magnitude cheaper than an OS-level context switch. The exact number moves around with workload, scheduler contention, and Go version; what's stable is the order of magnitude.&lt;/li&gt;
&lt;li&gt;When a goroutine blocks on network I/O, the runtime parks it and flips the underlying OS thread to run a different goroutine. The goroutine's state lives in Go's own scheduler, not in a kernel wait queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the actual answer to "why Go scales to millions of connections": &lt;strong&gt;the runtime refuses to hand idle work back to the kernel&lt;/strong&gt;. The kernel still does the real I/O — Go uses &lt;code&gt;epoll&lt;/code&gt; on Linux, &lt;code&gt;kqueue&lt;/code&gt; on BSD, IOCP on Windows — but it only involves the kernel when there's &lt;em&gt;actual&lt;/em&gt; work, not when a goroutine is just sitting around.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Small Benchmark That Tells the Whole Story
&lt;/h2&gt;

&lt;p&gt;Here's a stripped-down Go program that spins up N goroutines, each one holds a channel read, and prints the total RSS when they're all parked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"runtime"&lt;/span&gt;
    &lt;span class="s"&gt;"sync"&lt;/span&gt;
    &lt;span class="s"&gt;"syscall"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sscanf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
    &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="c"&gt;// park forever&lt;/span&gt;
        &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Let the runtime settle&lt;/span&gt;
    &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GC&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rusage&lt;/span&gt;
    &lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getrusage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUSAGE_SELF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"goroutines=%d  rss=%d KB  (%.1f KB/goroutine)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Maxrss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Maxrss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On my laptop (M1, Go 1.22, macOS):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;goroutines=10000    rss=28672 KB   (2.9 KB/goroutine)
goroutines=100000   rss=263168 KB  (2.6 KB/goroutine)
goroutines=1000000  rss=2600960 KB (2.6 KB/goroutine)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2.6 KB per parked goroutine&lt;/strong&gt;, flat, all the way to a million. That's the story. Not 1 MB. Not 256 KB. Two and a half KB.&lt;/p&gt;

&lt;p&gt;Try the equivalent program with &lt;code&gt;new Thread(() -&amp;gt; ...).start()&lt;/code&gt; in Java and you will run out of memory well before 100,000. The comparison isn't even close, and it isn't about execution speed — it's about what an idle waiter costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Parallel in Finance: Same Problem, Opposite Extreme
&lt;/h2&gt;

&lt;p&gt;The part that made this click for me is noticing where else this principle shows up. High-frequency trading engines and exchange colocation boxes have the same bottleneck — kernel context switches are expensive — and they solve it the other way: &lt;strong&gt;skip the kernel entirely&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DPDK&lt;/strong&gt; gives userspace direct access to the NIC. Packets bypass the kernel network stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel-bypass sockets&lt;/strong&gt; (Solarflare Onload, AWS Nitro enhanced networking) push the TCP/IP stack into userspace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;io_uring&lt;/strong&gt; on modern Linux brings the same idea to general-purpose code — a shared memory ring buffer between app and kernel, batched, with minimal syscalls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RDMA&lt;/strong&gt; lets network cards write directly into another machine's memory. No kernel on either end.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Different tools, same target: &lt;strong&gt;syscalls and context switches are expensive; keep them off the hot path&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Go arrives at the same destination with a completely different route. Instead of bypassing the kernel, it hides the kernel behind a user-space scheduler and only calls in when absolutely necessary. HFT says "the kernel is slow, route around it." Go says "the kernel is slow, so we'll handle most of the state ourselves and only ring the kernel's doorbell when we have real work." The principle is identical.&lt;/p&gt;

&lt;p&gt;Once you see this pattern, you start seeing it everywhere. V8 Isolates. Erlang processes. Rust async runtimes. The details differ but the bet is the same: &lt;strong&gt;keep concurrency cheap by keeping it out of the kernel&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Go Actually Breaks Under Load
&lt;/h2&gt;

&lt;p&gt;None of this means Go scales forever. When I've seen Go services crack at scale, it's usually not the runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File descriptors&lt;/strong&gt;: Default &lt;code&gt;ulimit -n&lt;/code&gt; is 1024 on most systems. You'll hit this before you stress the scheduler. Push it to 1M if you're actually building a long-poll service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral ports&lt;/strong&gt;: If your service fans out to a downstream with lots of short-lived outbound connections, the 28K-ish default ephemeral port range bites before anything else.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conntrack tables&lt;/strong&gt;: Linux's &lt;code&gt;nf_conntrack_max&lt;/code&gt; default is laughably small for a real service. Tune it or turn it off on high-throughput paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GC pressure from allocation-heavy handlers&lt;/strong&gt;: The scheduler is cheap; the garbage collector is not. Sync pools, stack-allocated buffers, and careful escape analysis still matter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The load balancer&lt;/strong&gt;: Your L4/L7 LB probably caps out before Go does.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've watched a Go service sit happily at 400K connections on a single pod while the upstream Envoy bled under its own CPU budget. The Go process was the calm one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency Isn't a Speed Contest
&lt;/h2&gt;

&lt;p&gt;It's a cost-of-idleness contest.&lt;/p&gt;

&lt;p&gt;If you're building anything with long-lived connections — streaming APIs, WebSocket fan-out, server-sent events, message brokers, pub/sub gateways, anything with more connections than cores — the question isn't "is my language fast?" It's "&lt;strong&gt;how much does one idle waiter cost me?&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;Go's answer is 2.6 KB and 200 nanoseconds. That's why it scales.&lt;/p&gt;

&lt;p&gt;If you come from a world where "high concurrency" means "we bought a bigger box," Go can feel like cheating. It isn't. It's just a careful, decade-old design decision that says: the kernel is a system call you should make as rarely as possible, and when you must, do it in bulk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://golang.org/src/runtime/HACKING.md" rel="noopener noreferrer"&gt;The Go Scheduler: Design Principles (Dmitry Vyukov)&lt;/a&gt; — runtime internals from a core contributor&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;runtime/proc.go&lt;/code&gt; in the Go source tree — the actual M/P/G logic, shorter and more readable than you'd expect&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://people.freebsd.org/~jlemon/papers/kqueue.pdf" rel="noopener noreferrer"&gt;Dragonfly BSD's &lt;code&gt;kqueue&lt;/code&gt; paper&lt;/a&gt; — where &lt;code&gt;epoll&lt;/code&gt; got many of its ideas&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://kernel.dk/io_uring.pdf" rel="noopener noreferrer"&gt;io_uring introduction (Jens Axboe)&lt;/a&gt; — the modern-kernel answer to the same problem Go solved in user space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to understand why the decade-old Go scheduler still holds up, read &lt;code&gt;runtime/proc.go&lt;/code&gt; once. The comments alone are worth an afternoon.&lt;/p&gt;

</description>
      <category>go</category>
      <category>concurrency</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Claude Code Deep Dive Part 4: Why It Uses Markdown Files Instead of Vector DBs</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Wed, 08 Apr 2026 05:19:40 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-4-why-it-uses-markdown-files-instead-of-vector-dbs-1hf6</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-4-why-it-uses-markdown-files-instead-of-vector-dbs-1hf6</guid>
      <description>&lt;p&gt;&lt;em&gt;This is Part 4 of our Claude Code Architecture Deep Dive series. &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1: 5 Hidden Features&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2: The 1,421-Line While Loop&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Part 3: Context Engineering — 5-Level Compression Pipeline&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article replaces and deepens our earlier analysis, &lt;a href="https://harrisonsec.com/blog/claude-code-memory-simpler-than-you-think/" rel="noopener noreferrer"&gt;Claude Code's Memory Is Simpler Than You Think&lt;/a&gt;. The original focused on limitations. This one focuses on **why&lt;/em&gt;* — the first-principles tradeoffs behind every design choice.*&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Principle: Only Record What Cannot Be Derived
&lt;/h2&gt;

&lt;p&gt;This single constraint governs every decision in Claude Code's memory system:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don't save code patterns — read the current code. Don't save git history — run &lt;code&gt;git log&lt;/code&gt;. Don't save file paths — glob the project. Don't save past bug fixes — they're in commits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't about saving storage. It's about &lt;strong&gt;preventing memory drift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If a memory says "auth module lives in &lt;code&gt;src/auth/&lt;/code&gt;", one refactor makes that memory a lie. But the model doesn't know it's a lie — it trusts specific references by default. A stale memory is worse than no memory at all, because the model acts on it with confidence.&lt;/p&gt;

&lt;p&gt;Code is self-describing. The source of truth is always the current state of the project, not a snapshot from three weeks ago. Memory should store &lt;strong&gt;meta-information&lt;/strong&gt; — who the user is, what they prefer, what decisions were made and why — not facts that the codebase already expresses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Types, Closed Taxonomy
&lt;/h2&gt;

&lt;p&gt;Claude Code enforces exactly four memory types. Not tags. Not categories. Four types with hard boundaries:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;What to Store&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;user&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Identity, preferences, expertise&lt;/td&gt;
&lt;td&gt;"Data scientist, focused on observability"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;feedback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Behavioral corrections AND confirmations&lt;/td&gt;
&lt;td&gt;"Don't summarize after code changes — user reads diffs"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;project&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decisions, deadlines, stakeholder context&lt;/td&gt;
&lt;td&gt;"Merge freeze after 2026-03-05 for mobile release"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;reference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pointers to external systems&lt;/td&gt;
&lt;td&gt;"Pipeline bugs tracked in Linear INGEST project"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why closed taxonomy beats open tagging:&lt;/strong&gt; Free-form tags cause label explosion. A model tagging memories freely might produce "coding-style", "code-style", "style-preference", "formatting" — four labels for the same concept. Closed taxonomy forces an explicit semantic choice. Each type has different storage structure (feedback requires &lt;code&gt;Why&lt;/code&gt; + &lt;code&gt;How to apply&lt;/code&gt; fields) and different retrieval behavior. The constraint buys clarity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Positive Feedback Matters More Than Corrections
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;feedback&lt;/code&gt; type stores both failures AND successes. The source code explains why:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If you only save corrections, you will avoid past mistakes but drift away from approaches the user has already validated, and may grow overly cautious."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Imagine the user says "this code style is great, keep doing this." If you don't save that, next session the model might "improve" the style — moving away from what the user explicitly liked. Positive feedback &lt;strong&gt;anchors&lt;/strong&gt; the model to known-good patterns. Without anchors, corrections alone push the model toward progressively safer (blander) output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Type: Relative Dates Kill You
&lt;/h3&gt;

&lt;p&gt;When a user says "merge freeze after Thursday", the memory must store "merge freeze after 2026-03-05." A memory read three weeks later has no idea what "Thursday" meant. This seems obvious, but it's an explicit rule in the source code because models default to storing user language verbatim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Sonnet Side-Query Instead of Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;This is the design choice that draws the most criticism. Claude Code uses a live LLM call (Sonnet) to pick relevant memories instead of vector similarity search. Here's the actual tradeoff:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBRdWVyeVsiVXNlciBxdWVyeSJdIC0tPiBTY2FuWyJTY2FuIG1lbW9yeSBkaXJcblJlYWQgZnJvbnRtYXR0ZXIgb25seVxuTWF4IDIwMCBmaWxlcyJdCiAgICBTY2FuIC0tPiBNYW5pZmVzdFsiRm9ybWF0IG1hbmlmZXN0XG50eXBlICsgZmlsZW5hbWUgKyB0aW1lc3RhbXBcbisgZGVzY3JpcHRpb24iXQogICAgTWFuaWZlc3QgLS0-IFNvbm5ldFsiU29ubmV0IHNpZGUtcXVlcnlcbn4yNTBtcywgMjU2IHRva2Vuc1xuU2VsZWN0IHRvcCA1Il0KICAgIFNvbm5ldCAtLT4gRmlsdGVyWyJEZWR1cGxpY2F0ZVxuUmVtb3ZlIGFscmVhZHktc3VyZmFjZWQiXQogICAgRmlsdGVyIC0tPiBJbmplY3RbIkluamVjdCBhcyBzeXN0ZW0tcmVtaW5kZXJcbldpdGggZnJlc2huZXNzIHdhcm5pbmciXQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBRdWVyeVsiVXNlciBxdWVyeSJdIC0tPiBTY2FuWyJTY2FuIG1lbW9yeSBkaXJcblJlYWQgZnJvbnRtYXR0ZXIgb25seVxuTWF4IDIwMCBmaWxlcyJdCiAgICBTY2FuIC0tPiBNYW5pZmVzdFsiRm9ybWF0IG1hbmlmZXN0XG50eXBlICsgZmlsZW5hbWUgKyB0aW1lc3RhbXBcbisgZGVzY3JpcHRpb24iXQogICAgTWFuaWZlc3QgLS0-IFNvbm5ldFsiU29ubmV0IHNpZGUtcXVlcnlcbn4yNTBtcywgMjU2IHRva2Vuc1xuU2VsZWN0IHRvcCA1Il0KICAgIFNvbm5ldCAtLT4gRmlsdGVyWyJEZWR1cGxpY2F0ZVxuUmVtb3ZlIGFscmVhZHktc3VyZmFjZWQiXQogICAgRmlsdGVyIC0tPiBJbmplY3RbIkluamVjdCBhcyBzeXN0ZW0tcmVtaW5kZXJcbldpdGggZnJlc2huZXNzIHdhcm5pbmciXQ%3D%3D" alt="flowchart LR" width="1704" height="118"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sonnet reads descriptions (not full content), evaluates semantic relevance, and returns up to 5 filenames. The call costs ~250ms and 256 output tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this over vector embeddings:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Sonnet Side-Query&lt;/th&gt;
&lt;th&gt;Vector Embeddings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Semantic depth&lt;/td&gt;
&lt;td&gt;Full language understanding — "deployment" matches "CI/CD"&lt;/td&gt;
&lt;td&gt;Cosine similarity — good but shallow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Zero — one API call&lt;/td&gt;
&lt;td&gt;Requires embedding model + vector store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparency&lt;/td&gt;
&lt;td&gt;Can inspect WHY a memory was selected&lt;/td&gt;
&lt;td&gt;Opaque similarity scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per query&lt;/td&gt;
&lt;td&gt;~250ms + 256 tokens (shared prompt cache)&lt;/td&gt;
&lt;td&gt;Embedding call + search latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaling&lt;/td&gt;
&lt;td&gt;Degrades past ~200 files&lt;/td&gt;
&lt;td&gt;Scales to millions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tradeoff is deliberate: for a &lt;strong&gt;session-based CLI tool&lt;/strong&gt; where users typically have 20-100 memories, Sonnet's semantic understanding beats vector search's scale. The 250ms latency is hidden entirely through &lt;strong&gt;async prefetch&lt;/strong&gt; — the search runs in parallel while the model generates its response. For the user, memory recall is "free."&lt;/p&gt;

&lt;h3&gt;
  
  
  The 5-File Cap: Constraint as Design
&lt;/h3&gt;

&lt;p&gt;Why limit to 5 memories when a user might have 200?&lt;/p&gt;

&lt;p&gt;This is not a technical limitation. It's a &lt;strong&gt;behavioral nudge&lt;/strong&gt;. If the system scaled to inject 50 memories, users would never clean up stale ones. The 5-file cap pushes users to write better descriptions (so the right memories get selected) and consolidate outdated entries (so slots aren't wasted on stale info).&lt;/p&gt;

&lt;p&gt;Design principle: &lt;strong&gt;constraints that change user behavior beat constraints that scale infrastructure.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background Extraction: The Invisible Agent
&lt;/h2&gt;

&lt;p&gt;Claude Code doesn't just save memories when you say &lt;code&gt;/remember&lt;/code&gt;. After every conversation turn where the main agent stops (no more tool calls), a &lt;strong&gt;forked background agent&lt;/strong&gt; runs to extract memorable information.&lt;/p&gt;

&lt;p&gt;Key design details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mutual exclusion&lt;/strong&gt;: If the main agent already wrote a memory in this turn, the extractor skips. No duplicate memories from the same conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trailing runs&lt;/strong&gt;: If extraction is still running when the next turn ends, the new request queues as &lt;code&gt;pendingContext&lt;/code&gt;. When the current extraction finishes, it picks up the pending work. No concurrent writes to the memory directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5-turn hard deadline&lt;/strong&gt;: The extractor gets at most 5 tool-call turns. Efficiency over completeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal permissions&lt;/strong&gt;: Read/Grep/Glob unlimited. Write &lt;strong&gt;only&lt;/strong&gt; to the memory directory. Cannot modify project files, execute code, or call external services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared prompt cache&lt;/strong&gt;: The forked agent reuses the parent's cached system prompt — near-zero additional token overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The execution strategy is prescribed in the prompt: "Turn 1: parallel reads of all existing memories. Turn 2: parallel writes of new memories." Two turns for the common case. The 5-turn budget handles edge cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust but Verify: The Eval That Proved It
&lt;/h2&gt;

&lt;p&gt;The most impactful section in Claude Code's memory prompt is &lt;code&gt;TRUSTING_RECALL_SECTION&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A memory that names a specific function, file, or flag is a claim that it existed &lt;em&gt;when the memory was written&lt;/em&gt;. It may have been renamed, removed, or never merged."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The rule: before acting on a memory that references a file path, verify the file exists (Glob). Before trusting a function name, confirm it's still there (Grep).&lt;/p&gt;

&lt;p&gt;This section's value was proven empirically: &lt;strong&gt;without it, eval pass rate was 0/2. With it, 3/3.&lt;/strong&gt; Models default to trusting specific references in memory. They'll confidently say "as stored in memory, the auth module is at &lt;code&gt;src/auth/&lt;/code&gt;" — even when that path was renamed weeks ago. The verification requirement breaks this default behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydwixv55gmz3gnjza4wg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydwixv55gmz3gnjza4wg.png" alt="Three Architectures, Three Tradeoffs" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Architectures, Three Tradeoffs
&lt;/h2&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; a ranking. I'm using OpenClaw and Hermes as contrasts because they represent the two obvious alternative bets: scale and autonomy. Claude Code, OpenClaw, and Hermes Agent made different choices for different deployment models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Markdown files (flat)&lt;/td&gt;
&lt;td&gt;MD + SQLite (FTS + vector)&lt;/td&gt;
&lt;td&gt;SQLite + FTS + MEMORY.md&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sonnet side-query (semantic)&lt;/td&gt;
&lt;td&gt;Embedding cosine + FTS fusion&lt;/td&gt;
&lt;td&gt;Full-text search + structured queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero (filesystem only)&lt;/td&gt;
&lt;td&gt;SQLite + embedding model&lt;/td&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transparency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full (plain text, human-readable)&lt;/td&gt;
&lt;td&gt;Partial (vector scores opaque)&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None (static after write)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Self-evolving (auto-generates skills)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Session model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Session-based, stateless between sessions&lt;/td&gt;
&lt;td&gt;Persistent, cross-session&lt;/td&gt;
&lt;td&gt;Persistent, self-improving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scale ceiling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200 files by design&lt;/td&gt;
&lt;td&gt;Scales with SQLite&lt;/td&gt;
&lt;td&gt;Scales with SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Claude Code's Bet
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Optimize for zero infrastructure and full transparency. Accept a scale ceiling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For a CLI tool that runs on a developer's laptop, requiring SQLite or an embedding service is friction. Plain Markdown files are human-readable, git-trackable, and editable with any text editor. The 200-file ceiling is intentional — if you need more, you should be consolidating, not scaling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When this breaks:&lt;/strong&gt; Teams with hundreds of shared memories. Long-running projects where memory accumulation outpaces cleanup. Multi-user scenarios where memory needs to be queried across team members.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenClaw's Bet
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Accept infrastructure overhead for persistent cross-session scale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw stores memories in SQLite with both full-text search and vector embeddings. This enables fuzzy semantic matching across thousands of memories, weighted fusion of multiple retrieval signals, and persistent state that survives across sessions indefinitely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When this breaks:&lt;/strong&gt; Setup complexity. Users must configure embedding models. Vector similarity scores are opaque — when the wrong memory is recalled, debugging why is harder than inspecting a Sonnet side-query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hermes Agent's Bet
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Accept complexity for a self-evolving learning loop.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Hermes doesn't just store memories — it generates &lt;strong&gt;skills&lt;/strong&gt; from completed tasks. After a complex task (5+ tool calls), the agent distills the entire process into a structured skill document. Next time it encounters a similar task, it loads the skill instead of solving from scratch. Skills self-iterate: if the agent finds a better approach during execution, it updates the skill automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When this breaks:&lt;/strong&gt; Skill quality is unverified. A bad skill propagated through the learning loop compounds errors. The self-evolving mechanism needs guardrails that don't exist yet — there's no eval framework for auto-generated skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Right Choice Depends on Your Deployment Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session-based, single user, zero setup → Claude Code's approach
Persistent, multi-user, cross-session  → OpenClaw's approach  
Autonomous, self-improving, research    → Hermes's approach
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no universal "best." The first-principles question is: &lt;strong&gt;what are you optimizing for — simplicity, scale, or autonomy?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Teaches About Agent Design
&lt;/h2&gt;

&lt;p&gt;Three principles that transfer beyond memory systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Constraints that change user behavior &amp;gt; constraints that scale infrastructure.&lt;/strong&gt; The 5-file cap is more effective than unlimited vector search, because it forces better memory hygiene. Don't build capacity for a mess — design incentives for cleanliness.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Eval data beats intuition for prompt engineering.&lt;/strong&gt; The trust-verification section wasn't added because someone thought it was a good idea. It was added because evals went from 0/2 to 3/3. If you can't measure it, you're guessing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use the model's own reasoning for retrieval when latency allows.&lt;/strong&gt; Sonnet understanding "deployment" relates to "CI/CD" is something no keyword match or embedding similarity can reliably do. When your retrieval budget allows a model call, the quality ceiling is higher than any static index.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Previous: &lt;a href="https://harrisonsec.com/blog/claude-code-context-engineering-compression-pipeline/" rel="noopener noreferrer"&gt;Part 3: Context Engineering — 5-Level Compression Pipeline&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2: The 1,421-Line While Loop&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1: 5 Hidden Features&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;See also: &lt;a href="https://harrisonsec.com/blog/claude-code-codex-plugin-two-brains/" rel="noopener noreferrer"&gt;Claude Code + Codex: Two Brains&lt;/a&gt; for how dual-AI workflows complement the memory system.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>memory</category>
      <category>agents</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>Claude Code Deep Dive Part 3: The 5-Level Compression Pipeline Behind 1M Tokens</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Wed, 08 Apr 2026 05:19:22 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-3-the-5-level-compression-pipeline-behind-200k-tokens-4im1</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-3-the-5-level-compression-pipeline-behind-200k-tokens-4im1</guid>
      <description>&lt;p&gt;&lt;em&gt;This is Part 3 of our Claude Code Architecture Deep Dive series. &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1: 5 Hidden Features&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2: The 1,421-Line While Loop&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-memory-first-principles-tradeoffs/" rel="noopener noreferrer"&gt;Part 4: Memory Tradeoffs&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Context Engineering Is the Real Moat
&lt;/h2&gt;

&lt;p&gt;Every AI agent has the same fundamental constraint: a fixed-size context window. Claude's is now up to 1M tokens. That sounds massive — until you realize a real coding session can easily generate multiples of that. Dozens of file reads, hundreds of tool calls, thousands of lines of output.&lt;/p&gt;

&lt;p&gt;The model's decision quality depends entirely on what it sees. Get the tradeoff wrong, and it forgets which files it just edited, re-reads content it already saw, or contradicts its own earlier decisions.&lt;/p&gt;

&lt;p&gt;Think of the context window as an office desk. Limited surface area. You need the most important documents within arm's reach, everything else filed in drawers — retrievable, but not cluttering your workspace.&lt;/p&gt;

&lt;p&gt;Claude Code's context engineering is that filing system. And it's far more sophisticated than most people expect. In &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;, we covered the 4-stage compression overview as part of the loop's survival mechanism. Here, we zoom into the internal engineering — revealing a 5th level most sessions never trigger, a dual-path algorithm that adapts to cache state, and a security blind spot in the summarizer.&lt;/p&gt;

&lt;p&gt;The compression pipeline alone lives in &lt;code&gt;src/services/compact/&lt;/code&gt; — over 3,960 lines of TypeScript across 5 files.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-Level Compression Pipeline
&lt;/h2&gt;

&lt;p&gt;The design philosophy is &lt;strong&gt;progressive compression&lt;/strong&gt;: cheapest first, heaviest last. Each level is more expensive than the previous one — consuming more compute or discarding more context detail.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8d9fzli4uyblkyxux3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8d9fzli4uyblkyxux3e.png" alt="The 5-Level Compression Pipeline" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBJbnB1dFsiTWVzc2FnZSBIaXN0b3J5Il0gLS0-IEwxWyJMZXZlbCAxOiBUb29sIFJlc3VsdCBCdWRnZXRcbjUwSyBjaGFyIHRocmVzaG9sZCDihpIgZGlzayArIDJLQiBwcmV2aWV3XG7wn5KwIENvc3Q6IFplcm8iXQogICAgTDEgLS0-IEwyWyJMZXZlbCAyOiBIaXN0b3J5IFNuaXBcbkZlYXR1cmUtZ2F0ZWQgdG9rZW4gcmVsZWFzZVxu8J-SsCBDb3N0OiBaZXJvIl0KICAgIEwyIC0tPiBMM1siTGV2ZWwgMzogTWljcm9jb21wYWN0XG5EdWFsIHBhdGg6IHRpbWUtYmFzZWQgT1IgY2FjaGUtZWRpdFxu8J-SsCBDb3N0OiBaZXJvIEFQSSBjYWxscyJdCiAgICBMMyAtLT4gTDRbIkxldmVsIDQ6IENvbnRleHQgQ29sbGFwc2VcblByb2plY3Rpb24tYmFzZWQgZm9sZGluZyB-OTAlXG7wn5KwIENvc3Q6IFplcm8gKG5vbi1kZXN0cnVjdGl2ZSkiXQogICAgTDQgLS0-IEw1WyJMZXZlbCA1OiBBdXRvY29tcGFjdFxuRm9yayBjaGlsZCBhZ2VudCBmb3IgZnVsbCBzdW1tYXJ5XG7wn5KwIENvc3Q6IE9uZSBBUEkgY2FsbCAoaXJyZXZlcnNpYmxlKSJd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBJbnB1dFsiTWVzc2FnZSBIaXN0b3J5Il0gLS0-IEwxWyJMZXZlbCAxOiBUb29sIFJlc3VsdCBCdWRnZXRcbjUwSyBjaGFyIHRocmVzaG9sZCDihpIgZGlzayArIDJLQiBwcmV2aWV3XG7wn5KwIENvc3Q6IFplcm8iXQogICAgTDEgLS0-IEwyWyJMZXZlbCAyOiBIaXN0b3J5IFNuaXBcbkZlYXR1cmUtZ2F0ZWQgdG9rZW4gcmVsZWFzZVxu8J-SsCBDb3N0OiBaZXJvIl0KICAgIEwyIC0tPiBMM1siTGV2ZWwgMzogTWljcm9jb21wYWN0XG5EdWFsIHBhdGg6IHRpbWUtYmFzZWQgT1IgY2FjaGUtZWRpdFxu8J-SsCBDb3N0OiBaZXJvIEFQSSBjYWxscyJdCiAgICBMMyAtLT4gTDRbIkxldmVsIDQ6IENvbnRleHQgQ29sbGFwc2VcblByb2plY3Rpb24tYmFzZWQgZm9sZGluZyB-OTAlXG7wn5KwIENvc3Q6IFplcm8gKG5vbi1kZXN0cnVjdGl2ZSkiXQogICAgTDQgLS0-IEw1WyJMZXZlbCA1OiBBdXRvY29tcGFjdFxuRm9yayBjaGlsZCBhZ2VudCBmb3IgZnVsbCBzdW1tYXJ5XG7wn5KwIENvc3Q6IE9uZSBBUEkgY2FsbCAoaXJyZXZlcnNpYmxlKSJd" alt="flowchart TD" width="276" height="950"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most conversations never reach Level 5. That's the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 1 — Tool Result Budget (Zero Cost)
&lt;/h3&gt;

&lt;p&gt;Problem: A single &lt;code&gt;FileReadTool&lt;/code&gt; call on a 10,000-line file dumps the entire thing into context. A &lt;code&gt;BashTool&lt;/code&gt; running &lt;code&gt;find&lt;/code&gt; returns thousands of paths.&lt;/p&gt;

&lt;p&gt;Solution: When a tool result exceeds 50,000 characters (&lt;code&gt;DEFAULT_MAX_RESULT_SIZE_CHARS&lt;/code&gt;), Claude Code doesn't truncate it — it &lt;strong&gt;persists the full output to disk&lt;/strong&gt; and keeps only a 2KB preview in context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;persisted-output&amp;gt;&lt;/span&gt;
Output too large (2.3 MB). Full output saved to:
/tmp/.claude/session-xxx/tool-results/toolu_abc123.txt

Preview (first 2.0 KB):
[first 2000 bytes of content]
...
&lt;span class="nt"&gt;&amp;lt;/persisted-output&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why persist instead of truncate? Truncation means permanent loss. If the model later needs line 500 of that output — maybe that's where the bug is — it can use the &lt;code&gt;Read&lt;/code&gt; tool to access the full file from disk. The 2KB preview gives enough context to decide whether that's necessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 2 — History Snip
&lt;/h3&gt;

&lt;p&gt;Think of History Snip as garbage collection for stale conversation scaffolding. If the session contains repetitive assistant wrappers, redundant bookkeeping, or older spans that no longer affect the next decision, this layer can cut them before heavier compression starts.&lt;/p&gt;

&lt;p&gt;Its real importance is accounting correctness. It feeds &lt;code&gt;snipTokensFreed&lt;/code&gt; into the autocompact threshold calculation. Without that correction, the last assistant message's &lt;code&gt;usage&lt;/code&gt; data still reflects the pre-snip context size, so autocompact can fire even after tokens were already freed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 3 — Microcompact (The Dual-Path Design)
&lt;/h3&gt;

&lt;p&gt;This is where it gets clever. Microcompact cleans up old tool results that are no longer useful — that file you read 30 minutes ago is probably irrelevant now, but it's still eating thousands of tokens.&lt;/p&gt;

&lt;p&gt;The twist: &lt;strong&gt;Microcompact has two completely different code paths&lt;/strong&gt;, selected based on cache state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path A — Cache Cold (Time-Based)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the user was away long enough for the prompt cache to expire (default 5-minute TTL), the cache is already dead. Rebuilding is inevitable. So Microcompact goes ahead and &lt;strong&gt;directly modifies message content&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// microCompact.ts — cold path&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[Old tool result content cleared]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple, brutal, effective. Keep only the N most recent compactable tool results, replace everything else with a placeholder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path B — Cache Hot (Cache-Editing)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the user is actively chatting and the prompt cache is warm — holding 100K+ tokens of cached prefix — directly modifying messages would &lt;strong&gt;invalidate the entire cache&lt;/strong&gt;. That's a massive cost hit.&lt;/p&gt;

&lt;p&gt;Instead, the hot path uses an API-level mechanism called &lt;code&gt;cache_edits&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tag tool result blocks with &lt;code&gt;cache_reference: tool_use_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Construct &lt;code&gt;cache_edits&lt;/code&gt; blocks telling the server to delete those references in-place&lt;/li&gt;
&lt;li&gt;Server-side deletion preserves cache warmth — no client re-upload needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The messages themselves are returned &lt;strong&gt;unchanged&lt;/strong&gt;. The edit happens at the API layer, invisible to the local conversation state.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Time-Based (Cold)&lt;/th&gt;
&lt;th&gt;Cache-Edit (Hot)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trigger&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Time gap exceeds threshold&lt;/td&gt;
&lt;td&gt;Tool count exceeds threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct message modification&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cache_edits&lt;/code&gt; API blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cache Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cache rebuilds anyway&lt;/td&gt;
&lt;td&gt;Preserves 100K+ cached prefix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Calls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Zero (edits piggyback on next request)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two paths are mutually exclusive. Time-based takes priority — if the cache is already cold, using &lt;code&gt;cache_edits&lt;/code&gt; is pointless.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 4 — Context Collapse (Non-Destructive)
&lt;/h3&gt;

&lt;p&gt;Think of this as a database &lt;strong&gt;View&lt;/strong&gt; — the underlying table (message array) stays unchanged, but queries (API requests) see a filtered, summarized projection.&lt;/p&gt;

&lt;p&gt;Context Collapse triggers at ~90% utilization. Unlike autocompact, it's &lt;strong&gt;reversible&lt;/strong&gt; — original messages are never deleted, and the collapse can be rolled back if needed. The summaries live in a separate collapse store, and &lt;code&gt;projectView()&lt;/code&gt; overlays them onto the original messages at query time.&lt;/p&gt;

&lt;p&gt;Critical interaction: when Context Collapse is active, &lt;strong&gt;Autocompact is suppressed&lt;/strong&gt;. Both compete for the same token space — autocompact at ~87%, collapse at ~90% — and autocompact would destroy the fine-grained context that collapse is trying to preserve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 5 — Autocompact (The Last Resort)
&lt;/h3&gt;

&lt;p&gt;When everything else fails to keep tokens under control, the system forks a child agent to summarize the entire conversation. This is expensive and irreversible.&lt;/p&gt;

&lt;p&gt;The compression prompt uses a two-phase &lt;strong&gt;Chain-of-Thought Scratchpad&lt;/strong&gt; technique:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;analysis&amp;gt;&lt;/code&gt; block&lt;/strong&gt; — the model walks through every message chronologically: user intent, approaches taken, key decisions, filenames, code snippets, errors, fixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt; block&lt;/strong&gt; — a structured summary with 9 standardized sections (Primary Request, Key Technical Concepts, Files and Code, Errors and Fixes, Problem Solving, All User Messages, Pending Tasks, Current Work, Optional Next Step)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The critical design: &lt;code&gt;formatCompactSummary()&lt;/code&gt; &lt;strong&gt;strips the &lt;code&gt;&amp;lt;analysis&amp;gt;&lt;/code&gt; block&lt;/strong&gt; and keeps only the &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt;. Chain-of-thought reasoning improves summary quality dramatically, but the reasoning itself would waste tokens if kept in context. Discard the work, keep the conclusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-Compression Recovery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Autocompact's biggest risk: the model "forgets" files it just edited. The system automatically runs &lt;code&gt;runPostCompactCleanup()&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Restore last 5 recently-read files (≤5K tokens each)&lt;/li&gt;
&lt;li&gt;Restore all activated skills (≤25K tokens total)&lt;/li&gt;
&lt;li&gt;Re-announce deferred tools, agent lists, MCP directives&lt;/li&gt;
&lt;li&gt;Reset Context Collapse state&lt;/li&gt;
&lt;li&gt;Restore Plan mode state if active&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this recovery step, the model would start re-reading files it just edited — or worse, make contradictory changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Circuit Breaker Story&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On March 10, 2026, Anthropic's telemetry showed 1,279 sessions with 50+ consecutive autocompact failures. The worst session hit 3,272 consecutive failures. Globally, this wasted approximately 250,000 API calls per day.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;, we mentioned the circuit breaker as a single boolean (&lt;code&gt;hasAttemptedReactiveCompact&lt;/code&gt;). Here's the production story behind it.&lt;/p&gt;

&lt;p&gt;The fix was three lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 3 consecutive failures, stop trying. The context is irrecoverably over-limit — burning more API calls won't help. This is a textbook circuit breaker: detect a failure loop, break it early, fail gracefully.&lt;/p&gt;

&lt;p&gt;Three adjacent systems make this pipeline viable in production: accurate token estimation, prompt-cache boundaries, and the summarizer's security assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Estimation Without API Calls
&lt;/h2&gt;

&lt;p&gt;Most agents estimate context size by counting tokens on the client. This typically has 30%+ error — enough to trigger compression too early or too late.&lt;/p&gt;

&lt;p&gt;Claude Code uses a smarter approach. Think of it as a morning weigh-in: you step on the scale at 75kg, then eat lunch. You don't need the scale again — estimating 75.5kg is good enough.&lt;/p&gt;

&lt;p&gt;The "scale" is the &lt;code&gt;usage&lt;/code&gt; data returned by every API response — server-side precise token counts. The "lunch" is the few messages added since then.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;tokenCountWithEstimation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Find the most recent message with server-reported usage&lt;/span&gt;
  &lt;span class="c1"&gt;// Use that as the anchor point&lt;/span&gt;
  &lt;span class="c1"&gt;// Estimate only the delta (new messages since anchor)&lt;/span&gt;
  &lt;span class="c1"&gt;// Result: &amp;lt;5% error vs 30%+ from pure client estimation&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This eliminates the need for tokenizer API calls while maintaining accuracy that's good enough for compression timing decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prompt Cache Architecture
&lt;/h2&gt;

&lt;p&gt;Claude Code's system prompt can be 50-100K tokens. Without caching, every API call would re-process this from scratch.&lt;/p&gt;

&lt;p&gt;The key innovation: &lt;code&gt;SYSTEM_PROMPT_DYNAMIC_BOUNDARY&lt;/code&gt; — a sentinel string that splits the system prompt into static and dynamic halves.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before the boundary&lt;/strong&gt;: core instructions, tool descriptions, security rules — identical for ALL users globally → cached with &lt;code&gt;scope: 'global'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After the boundary&lt;/strong&gt;: MCP tool instructions, output preferences, language settings — varies per user → not cached globally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means millions of Claude Code users &lt;strong&gt;share the same cached system prompt prefix&lt;/strong&gt;. One cache hit saves compute for everyone. But change one byte before the boundary, and the global cache breaks for all users.&lt;/p&gt;

&lt;p&gt;To protect this, Claude Code implements &lt;strong&gt;sticky-on latching&lt;/strong&gt; for beta headers: once a header is sent in a session, it persists for all subsequent requests — even if the feature flag is turned off mid-session. Flexibility sacrificed for cache stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Blind Spot
&lt;/h2&gt;

&lt;p&gt;Here's something the compression pipeline gets wrong: &lt;strong&gt;it treats all content equally&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The autocompact summarizer processes user instructions and tool results through the same pipeline. If an attacker plants malicious instructions inside a project file — and the model reads that file — those instructions survive compression. They become part of the summary, indistinguishable from legitimate context.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;analysis&amp;gt;&lt;/code&gt; scratchpad that makes summaries so good also faithfully preserves injected instructions. There's no classification step that distinguishes "user said this" from "this was in a file the model read."&lt;/p&gt;

&lt;p&gt;Additionally, &lt;code&gt;truncateHeadForPTLRetry()&lt;/code&gt; reveals another edge: when the conversation is so long that the compression request itself triggers a Prompt-Too-Long error, the system recursively drops the oldest turns to make the compression fit. An attacker could craft inputs that survive this truncation — instructions placed strategically in the middle of conversations, not at the edges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Designs Worth Stealing
&lt;/h2&gt;

&lt;p&gt;If you're building your own agent, these patterns transfer directly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Progressive compression (cheapest first)&lt;/strong&gt; — Don't jump to expensive summarization. Try zero-cost approaches first. Most sessions will never need the heavy option.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cache-aware dual paths&lt;/strong&gt; — Let infrastructure state drive algorithm selection. When cache is cold, optimize for simplicity. When cache is hot, optimize for preservation. Same goal, different strategies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Circuit breakers on automated recovery&lt;/strong&gt; — Never let a fix become a new failure mode. If compression fails 3 times, it will fail a 4th time. Stop. The 250K wasted API calls/day before this fix was added is a cautionary tale for any self-healing system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Next: &lt;a href="https://harrisonsec.com/blog/claude-code-memory-first-principles-tradeoffs/" rel="noopener noreferrer"&gt;Part 4: Memory — First-Principles Tradeoffs in Agent Persistence&lt;/a&gt; — why Anthropic chose Markdown files over vector databases, and when that's the wrong call.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previous: &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Part 2: The 1,421-Line While Loop&lt;/a&gt; | &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1: 5 Hidden Features&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>contextengineering</category>
      <category>agents</category>
      <category>compression</category>
    </item>
    <item>
      <title>Claude Code + Codex Plugin: Two AI Brains, One Terminal</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:47:24 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-codex-plugin-two-ai-brains-one-terminal-k31</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-codex-plugin-two-ai-brains-one-terminal-k31</guid>
      <description>&lt;p&gt;You're debugging a gnarly race condition. Claude Code has been going at it for 10 minutes — reading files, forming theories, running tests. Then it hits a wall. Same hypothesis, same failed fix, third attempt.&lt;/p&gt;

&lt;p&gt;What if you could call in a second brain — a completely different model with fresh eyes — without leaving your terminal?&lt;/p&gt;

&lt;p&gt;That's what the &lt;strong&gt;Codex plugin for Claude Code&lt;/strong&gt; does. It puts OpenAI's Codex (powered by GPT-5.4) inside your Claude Code session as a callable rescue agent. Two models. Two reasoning styles. One shared codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is It, Exactly?
&lt;/h2&gt;

&lt;p&gt;The Codex plugin is a &lt;strong&gt;Claude Code plugin&lt;/strong&gt; — not a standalone tool. It lives inside your Claude Code session and gives you slash commands to dispatch tasks to OpenAI's Codex CLI.&lt;/p&gt;

&lt;p&gt;Think of it as a second engineer sitting next to you. Claude (Opus) is your primary — it has the full conversation context, knows your project, runs your tools. Codex is your specialist — you hand it a focused task, it works in a sandboxed environment, and returns results.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;they don't compete. They complement.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude sees the big picture. It orchestrates, reads files, runs tools, manages state.&lt;/li&gt;
&lt;li&gt;Codex gets a sharp, scoped task. It reasons deeply on that one problem and comes back with an answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup: 3 Minutes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install the Codex CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Authenticate
&lt;/h3&gt;

&lt;p&gt;Inside Claude Code, type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!codex login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a browser for OpenAI authentication. Once done, your token is stored locally.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Verify
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code will check that the Codex CLI is installed, authenticated, and ready.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoze1uyqa3jdtsphmnxs.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoze1uyqa3jdtsphmnxs.jpg" alt="Codex setup — ready, authenticated, review gate available" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Commands
&lt;/h2&gt;

&lt;p&gt;The plugin adds 7 slash commands to Claude Code:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check installation and auth status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:rescue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hand a task to Codex (the main one you'll use)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run a Codex code review on your local git changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:adversarial-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Same, but Codex actively challenges your design choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check running/recent Codex jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:result&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get the output of a finished background job&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/codex:cancel&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kill an active background Codex job&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Rescue Workflow: When Claude Gets Stuck
&lt;/h2&gt;

&lt;p&gt;This is where the plugin shines. Claude Code will &lt;strong&gt;proactively&lt;/strong&gt; spawn the Codex rescue agent when it detects it's stuck — same hypothesis loop, repeated failures, or a task that needs a second implementation pass.&lt;/p&gt;

&lt;p&gt;You can also trigger it manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:rescue fix the race condition in src/worker.ts — tests pass locally but fail in CI under parallel execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens behind the scenes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Claude takes your request and shapes it into a structured prompt optimized for GPT-5.4&lt;/li&gt;
&lt;li&gt;The plugin invokes &lt;code&gt;codex-companion.mjs task&lt;/code&gt; with that prompt&lt;/li&gt;
&lt;li&gt;Codex works in the shared repository — reading files, reasoning, writing code&lt;/li&gt;
&lt;li&gt;Results come back into your Claude Code session&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvonf3oosq27oo2p89zj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvonf3oosq27oo2p89zj.jpg" alt="Codex rescue in action — dispatching task to GPT-5.4 via codex-companion" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Foreground vs Background
&lt;/h3&gt;

&lt;p&gt;Small, focused rescues run in the foreground — you wait and get the result immediately.&lt;/p&gt;

&lt;p&gt;Big, multi-step investigations can run in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:rescue --background investigate why the build is 3x slower since the last merge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check on it later with &lt;code&gt;/codex:status&lt;/code&gt; and grab results with &lt;code&gt;/codex:result&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Review: A Second Opinion That Actually Pushes Back
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sends your local git diff to Codex for review. It checks against your working tree or branch changes.&lt;/p&gt;

&lt;p&gt;But the real power is the adversarial review:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:adversarial-review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't "looks good to me." Codex will actively challenge your implementation approach, question design decisions, and flag things a polite reviewer wouldn't mention. It's the code review you &lt;em&gt;need&lt;/em&gt;, not the one you &lt;em&gt;want&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x63jksxnw8i6dp99hff.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x63jksxnw8i6dp99hff.jpg" alt="Codex review — checking git working tree for code review" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Which Brain
&lt;/h2&gt;

&lt;p&gt;After a month of daily use, here's my mental model:&lt;/p&gt;

&lt;h3&gt;
  
  
  Let Claude (Opus) Handle:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration&lt;/strong&gt; — multi-file changes, refactors across the codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context-heavy tasks&lt;/strong&gt; — "fix this bug" when you've been discussing it for 20 messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-heavy workflows&lt;/strong&gt; — file reads, grep, test runs, build commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation continuity&lt;/strong&gt; — anything that builds on prior context&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Call in Codex (GPT-5.4) For:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fresh eyes&lt;/strong&gt; — when Claude is circling the same hypothesis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep single-problem reasoning&lt;/strong&gt; — "why does this specific test fail under these exact conditions"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial review&lt;/strong&gt; — challenge assumptions Claude might share with you&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel investigation&lt;/strong&gt; — background a research task while Claude keeps working&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Pattern That Works Best
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Claude does the initial investigation — reads files, forms a theory&lt;/li&gt;
&lt;li&gt;If the theory doesn't pan out in 2-3 attempts, &lt;strong&gt;rescue to Codex&lt;/strong&gt; with the full context of what was tried&lt;/li&gt;
&lt;li&gt;Codex returns a diagnosis or fix&lt;/li&gt;
&lt;li&gt;Claude applies it in context, runs tests, iterates&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two models. Two reasoning paths. Converging on the same answer faster than either alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced: Prompt Shaping
&lt;/h2&gt;

&lt;p&gt;The plugin includes a &lt;code&gt;gpt-5-4-prompting&lt;/code&gt; skill that automatically structures your rescue requests into Codex-optimized prompts using XML tags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;task&amp;gt;&lt;/code&gt; — the concrete job&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;verification_loop&amp;gt;&lt;/code&gt; — how to confirm the fix works&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;grounding_rules&amp;gt;&lt;/code&gt; — stay anchored to evidence, not guesses&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;action_safety&amp;gt;&lt;/code&gt; — don't refactor unrelated code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to write these yourself. Claude does it automatically when it hands off to Codex. But knowing they exist explains why Codex rescue results are usually sharper than raw Codex CLI usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced: The Review Gate
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex:setup --enable-review-gate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When enabled, every &lt;code&gt;git commit&lt;/code&gt; in the repo triggers an automatic Codex review before the commit completes. It's a pre-commit hook powered by a second AI brain.&lt;/p&gt;

&lt;p&gt;This is aggressive — I only enable it on critical branches or before releases. But when you want zero-trust code quality, it's unmatched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The Codex plugin doesn't replace Claude Code. It makes Claude Code &lt;strong&gt;anti-fragile&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Every AI agent has blind spots — reasoning loops it can't escape, patterns it over-fits to, assumptions it shares with its user. A second model with a different training distribution breaks those loops.&lt;/p&gt;

&lt;p&gt;The dual-brain setup isn't about which model is "better." It's about &lt;strong&gt;coverage&lt;/strong&gt;. Two independent reasoning paths catch more bugs than one brilliant path run twice.&lt;/p&gt;

&lt;p&gt;If you're using Claude Code daily, install the Codex plugin. It's 3 minutes of setup and it will save you hours of "why is Claude stuck on this?"&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;Claude Code Architecture Deep Dive&lt;/a&gt; series. Previous: &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-query-loop/" rel="noopener noreferrer"&gt;The 1,421-Line While Loop That Runs Everything&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>codex</category>
      <category>agents</category>
      <category>openai</category>
    </item>
    <item>
      <title>Claude Code Deep Dive Part 2: The 1,421-Line While Loop That Runs Everything</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Fri, 03 Apr 2026 17:24:18 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-2-the-1421-line-while-loop-that-runs-everything-121</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-deep-dive-part-2-the-1421-line-while-loop-that-runs-everything-121</guid>
      <description>&lt;p&gt;&lt;em&gt;This is Part 2 of our Claude Code Architecture Deep Dive series. &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1: 5 Hidden Features&lt;/a&gt; covered the surface-level discoveries. Now we go deeper.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Heart of Claude Code
&lt;/h2&gt;

&lt;p&gt;Every AI coding agent — Claude Code, Cursor, Copilot — runs some version of the same loop: send context to an LLM, get back text and tool calls, execute tools, feed results back, repeat. We called this &lt;a href="https://harrisonsec.com/blog/ai-stack-explained-llm-talks-program-walks/" rel="noopener noreferrer"&gt;LLM talks, program walks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But Claude Code's implementation of this loop is anything but simple. It lives in &lt;code&gt;query.ts&lt;/code&gt;, a 1,729-line async generator. The &lt;code&gt;while(true)&lt;/code&gt; starts at line 307 and ends at line 1728 — a single loop body spanning 1,421 lines of production code.&lt;/p&gt;

&lt;p&gt;This is not a toy. This is the engine that processes every keystroke, every tool call, every error recovery, every context compression decision for millions of users.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// query.ts — line 307&lt;/span&gt;
&lt;span class="c1"&gt;// eslint-disable-next-line no-constant-condition&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;toolUseContext&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;
    &lt;span class="c1"&gt;// ... 1,421 lines of state machine logic ...&lt;/span&gt;
    &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;// while (true)  — line 1728&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why a State Machine, Not Recursion
&lt;/h2&gt;

&lt;p&gt;Early versions of Claude Code used recursion — the query function called itself. But recursion has a fatal flaw: in long conversations with hundreds of tool calls, the call stack grows until it explodes.&lt;/p&gt;

&lt;p&gt;The current design uses &lt;code&gt;while(true)&lt;/code&gt; with a &lt;code&gt;state&lt;/code&gt; object that carries context between iterations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// query.ts — lines 207-215 (State type, partial)&lt;/span&gt;
&lt;span class="nx"&gt;autoCompactTracking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AutoCompactTrackingState&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;
&lt;span class="nx"&gt;maxOutputTokensRecoveryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
&lt;span class="nx"&gt;hasAttemptedReactiveCompact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;       &lt;span class="c1"&gt;// circuit breaker for 413 recovery&lt;/span&gt;
&lt;span class="nx"&gt;stopHookActive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;
&lt;span class="nx"&gt;turnCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
&lt;span class="nx"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="c1"&gt;// why we continued&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;continue&lt;/code&gt; statement is a state transition. There are &lt;strong&gt;9 distinct &lt;code&gt;continue&lt;/code&gt; points&lt;/strong&gt; in the code (lines 950, 1115, 1165, 1220, 1251, 1305, 1316, 1340), each representing a different reason to run another turn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next tool call needed&lt;/li&gt;
&lt;li&gt;Reactive compact triggered after 413&lt;/li&gt;
&lt;li&gt;Max output tokens recovery&lt;/li&gt;
&lt;li&gt;Stop hook interrupted&lt;/li&gt;
&lt;li&gt;Token budget continuation&lt;/li&gt;
&lt;li&gt;And more&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Loop at a Glance
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBBWyLikaAgQ29tcHJlc3MgQ29udGV4dDxici8-KDQgc3RhZ2VzKSJdIC0tPiBCWyLikaEgVG9rZW4gQnVkZ2V0IENoZWNrIl0KICAgIEIgLS0-IENbIuKRoiBDYWxsIE1vZGVsIEFQSTxici8-KHN0cmVhbWluZykiXQogICAgQyAtLT4gRFsi4pGjIFN0cmVhbSBUb29sIEV4ZWN1dGlvbjxici8-KHBhcmFsbGVsIHdpdGggZ2VuZXJhdGlvbikiXQogICAgRCAtLT4gRVsi4pGkIEVycm9yIFJlY292ZXJ5PGJyLz4oNDEzIOKGkiByZWFjdGl2ZSBjb21wYWN0KSJdCiAgICBFIC0tPiBGWyLikaUgU3RvcCBIb29rcyJdCiAgICBGIC0tPiBHWyLikaYgVG9rZW4gQnVkZ2V0IENoZWNrICMyIl0KICAgIEcgLS0-IEhbIuKRpyBFeGVjdXRlIFRvb2xzPGJyLz4oMTQtc3RlcCBwaXBlbGluZSkiXQogICAgSCAtLT4gSVsi4pGoIEluamVjdCBBdHRhY2htZW50czxici8-KG1lbW9yeSwgc2tpbGxzLCBxdWV1ZWQgY21kcykiXQogICAgSSAtLT4gSlsi4pGpIEFzc2VtYmxlIE1lc3NhZ2VzIl0KICAgIEogLS0-fCJuZXh0IHR1cm4ifCBBCgogICAgc3R5bGUgQSBmaWxsOiMxYTRkMmUsc3Ryb2tlOiMyMmM1NWUsY29sb3I6I2ZmZgogICAgc3R5bGUgQyBmaWxsOiMxYTNhNWMsc3Ryb2tlOiMzYjgyZjYsY29sb3I6I2ZmZgogICAgc3R5bGUgRCBmaWxsOiM0YTM1MjAsc3Ryb2tlOiNmNTllMGIsY29sb3I6I2ZmZgogICAgc3R5bGUgRSBmaWxsOiM0YTIwMjAsc3Ryb2tlOiNlZjQ0NDQsY29sb3I6I2ZmZgogICAgc3R5bGUgSCBmaWxsOiMzYTIwNTAsc3Ryb2tlOiM4YjVjZjYsY29sb3I6I2ZmZg%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBBWyLikaAgQ29tcHJlc3MgQ29udGV4dDxici8-KDQgc3RhZ2VzKSJdIC0tPiBCWyLikaEgVG9rZW4gQnVkZ2V0IENoZWNrIl0KICAgIEIgLS0-IENbIuKRoiBDYWxsIE1vZGVsIEFQSTxici8-KHN0cmVhbWluZykiXQogICAgQyAtLT4gRFsi4pGjIFN0cmVhbSBUb29sIEV4ZWN1dGlvbjxici8-KHBhcmFsbGVsIHdpdGggZ2VuZXJhdGlvbikiXQogICAgRCAtLT4gRVsi4pGkIEVycm9yIFJlY292ZXJ5PGJyLz4oNDEzIOKGkiByZWFjdGl2ZSBjb21wYWN0KSJdCiAgICBFIC0tPiBGWyLikaUgU3RvcCBIb29rcyJdCiAgICBGIC0tPiBHWyLikaYgVG9rZW4gQnVkZ2V0IENoZWNrICMyIl0KICAgIEcgLS0-IEhbIuKRpyBFeGVjdXRlIFRvb2xzPGJyLz4oMTQtc3RlcCBwaXBlbGluZSkiXQogICAgSCAtLT4gSVsi4pGoIEluamVjdCBBdHRhY2htZW50czxici8-KG1lbW9yeSwgc2tpbGxzLCBxdWV1ZWQgY21kcykiXQogICAgSSAtLT4gSlsi4pGpIEFzc2VtYmxlIE1lc3NhZ2VzIl0KICAgIEogLS0-fCJuZXh0IHR1cm4ifCBBCgogICAgc3R5bGUgQSBmaWxsOiMxYTRkMmUsc3Ryb2tlOiMyMmM1NWUsY29sb3I6I2ZmZgogICAgc3R5bGUgQyBmaWxsOiMxYTNhNWMsc3Ryb2tlOiMzYjgyZjYsY29sb3I6I2ZmZgogICAgc3R5bGUgRCBmaWxsOiM0YTM1MjAsc3Ryb2tlOiNmNTllMGIsY29sb3I6I2ZmZgogICAgc3R5bGUgRSBmaWxsOiM0YTIwMjAsc3Ryb2tlOiNlZjQ0NDQsY29sb3I6I2ZmZgogICAgc3R5bGUgSCBmaWxsOiMzYTIwNTAsc3Ryb2tlOiM4YjVjZjYsY29sb3I6I2ZmZg%3D%3D" alt="flowchart TD" width="342" height="1198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  10 Steps Per Iteration
&lt;/h2&gt;

&lt;p&gt;Each time the loop runs, it does these 10 things in order. Every step has real source code behind it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Context Compression (4 stages)
&lt;/h3&gt;

&lt;p&gt;Before calling the API, the system tries to fit everything into the context window. Four compression mechanisms fire in priority order (imports at lines 12-16, 115-116):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Snip Compact&lt;/strong&gt; — trims overly long individual messages in history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Micro Compact&lt;/strong&gt; — finer-grained editing based on &lt;code&gt;tool_use_id&lt;/code&gt;, cache-friendly (line 370: "microcompact operates purely by tool_use_id")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Collapse&lt;/strong&gt; — folds inactive context regions into summaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto Compact&lt;/strong&gt; — when total tokens approach the threshold, triggers full compression&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are not mutually exclusive — they run in priority order:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBBWyJTbmlwIENvbXBhY3Q8YnIvPjxpPnRyaW0gbG9uZyBtZXNzYWdlczwvaT4iXSAtLT58InN0aWxsIHRvbyBiaWc_InwgQlsiTWljcm8gQ29tcGFjdDxici8-PGk-dG9vbF91c2VfaWQgZWRpdHM8L2k-Il0KICAgIEIgLS0-fCJzdGlsbCB0b28gYmlnPyJ8IENbIkNvbnRleHQgQ29sbGFwc2U8YnIvPjxpPmZvbGQgaW5hY3RpdmUgcmVnaW9uczwvaT4iXQogICAgQyAtLT58InN0aWxsIHRvbyBiaWc_InwgRFsiQXV0byBDb21wYWN0PGJyLz48aT5mdWxsIGNvbXByZXNzaW9uPC9pPiJdCiAgICBEIC0tPnwiQVBJIHJldHVybnMgNDEzInwgRVsiUmVhY3RpdmUgQ29tcGFjdDxici8-PGk-ZW1lcmdlbmN5LCBvbmNlIG9ubHk8L2k-Il0KCiAgICBzdHlsZSBBIGZpbGw6IzFhM2EyZSxzdHJva2U6IzRhZGU4MCxjb2xvcjojZmZmCiAgICBzdHlsZSBCIGZpbGw6IzFhM2EyZSxzdHJva2U6IzRhZGU4MCxjb2xvcjojZmZmCiAgICBzdHlsZSBDIGZpbGw6IzNhMzUyMCxzdHJva2U6I2ZiYmYyNCxjb2xvcjojZmZmCiAgICBzdHlsZSBEIGZpbGw6IzRhMjAyMCxzdHJva2U6I2VmNDQ0NCxjb2xvcjojZmZmCiAgICBzdHlsZSBFIGZpbGw6IzRhMTAyMCxzdHJva2U6I2RjMjYyNixjb2xvcjojZmZm" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBBWyJTbmlwIENvbXBhY3Q8YnIvPjxpPnRyaW0gbG9uZyBtZXNzYWdlczwvaT4iXSAtLT58InN0aWxsIHRvbyBiaWc_InwgQlsiTWljcm8gQ29tcGFjdDxici8-PGk-dG9vbF91c2VfaWQgZWRpdHM8L2k-Il0KICAgIEIgLS0-fCJzdGlsbCB0b28gYmlnPyJ8IENbIkNvbnRleHQgQ29sbGFwc2U8YnIvPjxpPmZvbGQgaW5hY3RpdmUgcmVnaW9uczwvaT4iXQogICAgQyAtLT58InN0aWxsIHRvbyBiaWc_InwgRFsiQXV0byBDb21wYWN0PGJyLz48aT5mdWxsIGNvbXByZXNzaW9uPC9pPiJdCiAgICBEIC0tPnwiQVBJIHJldHVybnMgNDEzInwgRVsiUmVhY3RpdmUgQ29tcGFjdDxici8-PGk-ZW1lcmdlbmN5LCBvbmNlIG9ubHk8L2k-Il0KCiAgICBzdHlsZSBBIGZpbGw6IzFhM2EyZSxzdHJva2U6IzRhZGU4MCxjb2xvcjojZmZmCiAgICBzdHlsZSBCIGZpbGw6IzFhM2EyZSxzdHJva2U6IzRhZGU4MCxjb2xvcjojZmZmCiAgICBzdHlsZSBDIGZpbGw6IzNhMzUyMCxzdHJva2U6I2ZiYmYyNCxjb2xvcjojZmZmCiAgICBzdHlsZSBEIGZpbGw6IzRhMjAyMCxzdHJva2U6I2VmNDQ0NCxjb2xvcjojZmZmCiAgICBzdHlsZSBFIGZpbGw6IzRhMTAyMCxzdHJva2U6I2RjMjYyNixjb2xvcjojZmZm" alt="flowchart LR" width="1552" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The system tries lightweight options first. If snip + micro bring tokens under the limit, the heavy compressors never run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Token Budget Check
&lt;/h3&gt;

&lt;p&gt;If a token budget is active (&lt;code&gt;feature('TOKEN_BUDGET')&lt;/code&gt;, line 280), the system checks whether to continue. Users can specify targets like "+500k", and the system tracks cumulative output tokens per turn, injecting nudge messages near the goal to keep the model working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Call Model API
&lt;/h3&gt;

&lt;p&gt;Line 659 — the actual API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for await (const message of deps.callModel({
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a streaming call. The response arrives token by token, and the system processes it incrementally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Streaming Tool Execution
&lt;/h3&gt;

&lt;p&gt;This is a critical optimization. Traditional agents wait for the model to finish generating all output, then execute tools. Claude Code uses &lt;code&gt;StreamingToolExecutor&lt;/code&gt; (imported at line 96):&lt;/p&gt;

&lt;p&gt;When the model is still generating its second tool call, the first one is already running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional Agent (sequential):
┌─────────────────────────┐┌───┐┌───┐┌───┐┌───┐┌───┐
│  LLM generates 5 calls  ││ T1││ T2││ T3││ T4││ T5│  ← 30s total
└─────────────────────────┘└───┘└───┘└───┘└───┘└───┘

Claude Code (streaming):
┌─────────────────────────┐
│  LLM generates 5 calls  │
├──┬──┬──┬──┬─────────────┘
│T1│T2│T3│T4│T5│                                       ← 18s total
└──┴──┴──┴──┴──┘
↑ tools start while LLM is still generating
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a turn with 5 tool calls, traditional waits 30 seconds. Streaming finishes in 18 — a &lt;strong&gt;40% speedup&lt;/strong&gt; from architecture alone, not model improvements.&lt;/p&gt;

&lt;p&gt;Line 554-555 reveals an interesting detail: &lt;code&gt;stop_reason === 'tool_use'&lt;/code&gt; is unreliable — "it's not always set correctly." The system detects tool calls by watching for &lt;code&gt;tool_use&lt;/code&gt; blocks during streaming instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Error Recovery
&lt;/h3&gt;

&lt;p&gt;If the prompt is too long? Try context collapse drain. If that fails, try reactive compact (line 15-16). If the API returns 413 (prompt too long), trigger emergency compression and retry.&lt;/p&gt;

&lt;p&gt;But there's a circuit breaker: &lt;code&gt;hasAttemptedReactiveCompact&lt;/code&gt; (line 209, initialized &lt;code&gt;false&lt;/code&gt; at line 275) ensures each turn only attempts reactive compact once. Without this, a genuinely oversized conversation would loop forever.&lt;/p&gt;

&lt;p&gt;The system also handles model degradation — if the primary model fails, it can fall back to a different model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Stop Hooks
&lt;/h3&gt;

&lt;p&gt;After the model stops outputting, the system runs registered stop hooks. These can inspect the output and decide whether to let the model continue. This is where external governance plugs in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Token Budget Check (Again)
&lt;/h3&gt;

&lt;p&gt;Yes, checked twice — once before calling the model (should we even start?) and once after (did we exceed the budget?). The second check decides whether to inject a "keep going" nudge or stop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Tool Execution
&lt;/h3&gt;

&lt;p&gt;If the response contains &lt;code&gt;tool_use&lt;/code&gt; blocks, execute them. Two paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;runTools()&lt;/code&gt; (from &lt;code&gt;toolOrchestration.ts&lt;/code&gt;, line 98) — batch execution&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;StreamingToolExecutor&lt;/code&gt; (line 96) — streaming execution, gated by &lt;code&gt;config.gates.streamingToolExecution&lt;/code&gt; (line 561)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool call goes through the 14-step execution pipeline in &lt;code&gt;toolExecution.ts&lt;/code&gt; (1,745 lines) — validation, permission checks, hooks, actual execution, analytics. That's a story for &lt;a href="https://harrisonsec.com/blog/claude-code-deep-dive-tool-pipeline/" rel="noopener noreferrer"&gt;Part 3&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Attachment Injection
&lt;/h3&gt;

&lt;p&gt;After tools finish, the system injects additional context before the next turn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory attachments&lt;/strong&gt; — relevant memories from the &lt;code&gt;memdir/&lt;/code&gt; system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill discovery&lt;/strong&gt; — matching skills based on the current task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queued commands&lt;/strong&gt; — any commands that were waiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This happens after tool execution but before the next API call, ensuring the model has fresh context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Assemble and Loop
&lt;/h3&gt;

&lt;p&gt;Build the new message list from all the pieces — original conversation, tool results, attachments, system reminders — and go back to step 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Architecture Matters
&lt;/h2&gt;

&lt;p&gt;Most open-source AI agents implement the loop as 50 lines of pseudocode: call model, parse tool calls, execute, repeat. Claude Code's 1,421-line version exists because production reality is messy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context doesn't fit.&lt;/strong&gt; A real coding session easily hits 200K tokens. Without the 4-stage compression pipeline, the agent dies on every long conversation. Most agents just truncate and lose context. Claude Code compresses intelligently — lightweight first, heavy only when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models fail.&lt;/strong&gt; APIs return 413, connections drop, rate limits hit. The 9 continue points aren't over-engineering — they're the minimum number of recovery paths needed for reliable operation. The &lt;code&gt;hasAttemptedReactiveCompact&lt;/code&gt; circuit breaker is the kind of detail that separates a demo from a product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed matters more than correctness of execution order.&lt;/strong&gt; Streaming tool execution — starting the first tool while the model is still generating the third — is a user experience decision backed by architecture. Traditional agents feel slow because they are: they serialize everything. Claude Code parallelizes at the loop level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tokens cost money.&lt;/strong&gt; The &lt;code&gt;SYSTEM_PROMPT_DYNAMIC_BOUNDARY&lt;/code&gt; marker in &lt;code&gt;prompts.ts&lt;/code&gt; (914 lines) splits the system prompt into static (cacheable) and dynamic sections. If two requests share the same static prefix byte-for-byte, the API caches it. Source comment: "don't modify content before the boundary, or you'll destroy the cache." This is prompt cache economics — saving Anthropic real compute costs at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Behavioral Constitution
&lt;/h2&gt;

&lt;p&gt;Buried inside the prompt assembly, &lt;code&gt;getSimpleDoingTasksSection()&lt;/code&gt; may be the most valuable function in the entire codebase. It encodes hard-won rules about what the model should NOT do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't add features the user didn't ask for&lt;/li&gt;
&lt;li&gt;Don't over-abstract — three duplicate lines beat a premature abstraction&lt;/li&gt;
&lt;li&gt;Don't add comments to code you didn't change&lt;/li&gt;
&lt;li&gt;Don't add unnecessary error handling&lt;/li&gt;
&lt;li&gt;Read code before modifying it&lt;/li&gt;
&lt;li&gt;If a method fails, diagnose before retrying&lt;/li&gt;
&lt;li&gt;Report honestly — don't say you ran something you didn't&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anyone who has used Claude Code recognizes these rules. I've personally watched the system refuse to add "helpful" abstractions and stick to minimal changes. That's not the model being disciplined — it's the prompt constraining the model. The takeaway: &lt;strong&gt;don't trust model self-discipline. Codify the behavior.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Other Agents Compare
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Typical OSS Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Loop complexity&lt;/td&gt;
&lt;td&gt;1,421 lines, 9 continue points&lt;/td&gt;
&lt;td&gt;Unknown (closed source)&lt;/td&gt;
&lt;td&gt;~50-200 lines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression&lt;/td&gt;
&lt;td&gt;4-stage pipeline + reactive 413 recovery&lt;/td&gt;
&lt;td&gt;Tab-level context pruning&lt;/td&gt;
&lt;td&gt;Truncate or fail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool execution&lt;/td&gt;
&lt;td&gt;Streaming (parallel with generation)&lt;/td&gt;
&lt;td&gt;Sequential&lt;/td&gt;
&lt;td&gt;Sequential&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error recovery&lt;/td&gt;
&lt;td&gt;Circuit breakers, model fallback, emergency compact&lt;/td&gt;
&lt;td&gt;Basic retry&lt;/td&gt;
&lt;td&gt;Crash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching&lt;/td&gt;
&lt;td&gt;Static/dynamic boundary, section registry&lt;/td&gt;
&lt;td&gt;Unknown&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap between Claude Code and most open-source agents is not model quality — it's the program layer. The model is the same Opus or Sonnet for everyone. What makes Claude Code feel different is 1,421 lines of careful engineering around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The query loop is where "LLM talks, program walks" becomes concrete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The LLM outputs text and tool call JSON. That's it.&lt;/li&gt;
&lt;li&gt;The program handles compression, budget tracking, error recovery, streaming, permissions, memory injection, and 14-step tool validation.&lt;/li&gt;
&lt;li&gt;The 1,421 lines are not the model being smart. They're the program being careful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building an AI agent and your main loop is under 100 lines, you're not handling the cases that matter. Production is not about the happy path. It's about what happens when context overflows, the API returns 413, the user's conversation hits 500 turns, and three tools need to run while the model is still thinking.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Part 3 — The 14-Step Tool Execution Pipeline (coming soon) — what happens between "model says call this tool" and the tool actually running.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previous: &lt;a href="https://harrisonsec.com/blog/claude-code-source-leaked-hidden-features/" rel="noopener noreferrer"&gt;Part 1 — 5 Hidden Features Found in 510K Lines&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Video: &lt;a href="https://youtu.be/giNERYV-X7k" rel="noopener noreferrer"&gt;The AI Stack Explained — LLM Talks, Program Walks&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>agents</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Claude Code Source Leaked: 5 Hidden Features Found in 510K Lines of Code</title>
      <dc:creator>Harrison Guo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 22:02:07 +0000</pubDate>
      <link>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-source-leaked-5-hidden-features-found-in-510k-lines-of-code-3mbn</link>
      <guid>https://forem.com/harrison_guo_e01b4c8793a0/claude-code-source-leaked-5-hidden-features-found-in-510k-lines-of-code-3mbn</guid>
      <description>&lt;h2&gt;
  
  
  What Happened
&lt;/h2&gt;

&lt;p&gt;Anthropic shipped Claude Code v2.1.88 to npm with a 60MB source map still attached. That single file contained 1,906 source files and 510,000 lines of fully readable TypeScript. No minification. No obfuscation. Just the raw codebase, sitting in a public registry for anyone to download.&lt;/p&gt;

&lt;p&gt;Within hours, backup repositories appeared on GitHub. One of them — &lt;a href="https://github.com/instructkr/claude-code" rel="noopener noreferrer"&gt;instructkr/claude-code&lt;/a&gt; — racked up 20,000+ stars almost instantly. Anthropic pulled the package, but the code was already mirrored everywhere. The cat was out of the bag, and it had opinions about AI safety.&lt;/p&gt;

&lt;h2&gt;
  
  
  5 Hidden Features Found in the Source
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Buddy Pet System
&lt;/h3&gt;

&lt;p&gt;Deep in &lt;code&gt;buddy/types.ts&lt;/code&gt;, there is a complete virtual pet system. Eighteen species, five rarity tiers, shiny variants, hats, custom eyes, and stat blocks. This was clearly planned as an April Fools easter egg.&lt;/p&gt;

&lt;p&gt;The species list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SPECIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;duck&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;goose&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;blob&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dragon&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;octopus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;owl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;penguin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;turtle&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;snail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ghost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axolotl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;capybara&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cactus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;robot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rabbit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mushroom&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chonk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rarity weights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;RARITY_WEIGHTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;common&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 60%&lt;/span&gt;
  &lt;span class="na"&gt;uncommon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 25%&lt;/span&gt;
  &lt;span class="na"&gt;rare&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 10%&lt;/span&gt;
  &lt;span class="na"&gt;epic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;//  4%&lt;/span&gt;
  &lt;span class="na"&gt;legendary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;   &lt;span class="c1"&gt;//  1%&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each buddy gets a hat, eyes, and stats:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;crown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tophat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;propeller&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;halo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wizard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;beanie&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tinyduck&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Eye&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;·&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;✦&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;×&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;◉&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;°&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Stat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DEBUGGING&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PATIENCE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CHAOS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;WISDOM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SNARK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your buddy is generated deterministically from &lt;code&gt;hash(userId)&lt;/code&gt;. Every account gets a unique pet. There is also a &lt;code&gt;shiny&lt;/code&gt; boolean variant — presumably the rare version you brag about in team Slack.&lt;/p&gt;

&lt;p&gt;This was 100% an April 1st drop. The leak killed the surprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Undercover Mode
&lt;/h3&gt;

&lt;p&gt;This one raised eyebrows. In &lt;code&gt;utils/undercover.ts&lt;/code&gt;, there is a mode that makes Claude pretend to be a human developer.&lt;/p&gt;

&lt;p&gt;When active, undercover mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removes &lt;strong&gt;all&lt;/strong&gt; Anthropic traces from commits and pull requests&lt;/li&gt;
&lt;li&gt;Strips &lt;code&gt;Co-Authored-By&lt;/code&gt; headers&lt;/li&gt;
&lt;li&gt;Removes any mention of "Claude Code" or model names&lt;/li&gt;
&lt;li&gt;Writes commit messages "as a human developer would"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// UNDERCOVER MODE — CRITICAL&lt;/span&gt;
&lt;span class="c1"&gt;// You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository.&lt;/span&gt;
&lt;span class="c1"&gt;// Your commit messages, PR titles, and PR bodies MUST NOT contain&lt;/span&gt;
&lt;span class="c1"&gt;// ANY Anthropic-internal information. Do not blow your cover.&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;// NEVER include in commit messages or PR descriptions:&lt;/span&gt;
&lt;span class="c1"&gt;// - Internal model codenames (animal names like Capybara, Tengu, etc.)&lt;/span&gt;
&lt;span class="c1"&gt;// - Unreleased model version numbers (e.g., opus-4-7, sonnet-4-8)&lt;/span&gt;
&lt;span class="c1"&gt;// - The phrase "Claude Code" or any mention that you are an AI&lt;/span&gt;
&lt;span class="c1"&gt;// - Co-Authored-By lines or any other attribution&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;// Write commit messages as a human developer would.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It auto-activates for public and open-source repositories. You can force it on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CLAUDE_CODE_UNDERCOVER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no force-OFF override. The safety default is undercover-on for public repos.&lt;/p&gt;

&lt;p&gt;The kicker: this feature is gated to internal Anthropic employees only (&lt;code&gt;USER_TYPE === 'ant'&lt;/code&gt;). Anthropic's own developers use Claude Code on open-source projects, and this mode ensures their contributions look human-written. Draw your own conclusions about what that means for AI attribution norms.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Kairos — Permanent Memory
&lt;/h3&gt;

&lt;p&gt;Behind the feature flag &lt;code&gt;KAIROS&lt;/code&gt; in &lt;code&gt;main.tsx&lt;/code&gt; and the &lt;code&gt;memdir/&lt;/code&gt; directory, there is a persistent memory system that survives across sessions.&lt;/p&gt;

&lt;p&gt;This is not the &lt;code&gt;.claude/&lt;/code&gt; project memory you already know. Kairos is a four-stage memory consolidation pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Orient&lt;/strong&gt; — scan context, identify what matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collect&lt;/strong&gt; — gather facts, decisions, patterns from the session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidate&lt;/strong&gt; — merge new memories with existing long-term store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prune&lt;/strong&gt; — discard stale or low-value memories&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system runs automatically when you are not actively using Claude Code. It tracks memory age, performs periodic scans, and supports team memory paths — meaning shared memory across a team's Claude Code instances.&lt;/p&gt;

&lt;p&gt;This turns Claude Code from a stateless tool into a persistent assistant that learns your codebase, your patterns, and your preferences over time. It is the most architecturally significant hidden feature in the leak.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Ultraplan — Deep Task Planning
&lt;/h3&gt;

&lt;p&gt;The feature flag &lt;code&gt;ULTRAPLAN&lt;/code&gt; in &lt;code&gt;commands.ts&lt;/code&gt; enables a deep planning mode that can run for up to 30 minutes on a single task. It uses remote agent execution — meaning the heavy thinking happens server-side, not in your terminal.&lt;/p&gt;

&lt;p&gt;Ultraplan is listed under &lt;code&gt;INTERNAL_ONLY_COMMANDS&lt;/code&gt;. Anthropic's engineers apparently have access to a planning mode that goes far beyond what ships to paying customers. This is the kind of feature that separates "AI autocomplete" from "AI architect."&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-Agent, Voice, and Daemon Modes
&lt;/h3&gt;

&lt;p&gt;The source reveals several execution modes that are not publicly documented:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coordinator mode&lt;/strong&gt; — orchestrates multiple Claude instances running in parallel, each working on a subtask&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice mode&lt;/strong&gt; (&lt;code&gt;VOICE_MODE&lt;/code&gt; flag) — voice input/output for Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bridge mode&lt;/strong&gt; (&lt;code&gt;BRIDGE_MODE&lt;/code&gt;) — remote control of a Claude Code instance from another process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daemon mode&lt;/strong&gt; (&lt;code&gt;DAEMON&lt;/code&gt;) — runs Claude Code as a background process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UDS inbox&lt;/strong&gt; (&lt;code&gt;UDS_INBOX&lt;/code&gt;) — Unix domain socket for inter-process communication between Claude instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these paint a picture of Claude Code evolving from a single-user CLI into a multi-agent orchestration platform. The daemon + UDS architecture means Claude Code instances can message each other, coordinate work, and run without a terminal attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Architecture
&lt;/h2&gt;

&lt;p&gt;The entire Claude Code engine lives in &lt;code&gt;queryLoop()&lt;/code&gt; at &lt;code&gt;query.ts&lt;/code&gt; line 241. At line 307, there is a &lt;code&gt;while(true)&lt;/code&gt; loop that drives everything:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;callModel()&lt;/code&gt; sends the conversation to the LLM&lt;/li&gt;
&lt;li&gt;The LLM returns text and &lt;code&gt;tool_use&lt;/code&gt; JSON blocks&lt;/li&gt;
&lt;li&gt;The program parses each &lt;code&gt;tool_use&lt;/code&gt;, checks permissions, executes the tool&lt;/li&gt;
&lt;li&gt;Results feed back into the conversation&lt;/li&gt;
&lt;li&gt;Loop continues until the LLM stops requesting tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the "LLM talks, program walks" pattern I wrote about &lt;a href="https://harrisonsec.com/blog/ai-stack-explained-llm-talks-program-walks/" rel="noopener noreferrer"&gt;previously&lt;/a&gt;. The LLM decides what to do. The program decides whether to allow it, then does it. Seeing it confirmed in 510K lines of production code is satisfying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Architecture
&lt;/h2&gt;

&lt;p&gt;Claude Code's permission system is the most carefully engineered part of the codebase. Every tool call passes through six layers, implemented in &lt;code&gt;useCanUseTool.tsx&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Config allowlist&lt;/strong&gt; — checks project and user configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-mode classifier&lt;/strong&gt; — determines if the tool is safe for autonomous execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordinator gate&lt;/strong&gt; — validates against the orchestration layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swarm worker gate&lt;/strong&gt; — checks permissions for sub-agent execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bash classifier&lt;/strong&gt; — analyzes shell commands for safety&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive user prompt&lt;/strong&gt; — final human confirmation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;External commands run in a sandbox. This is defense-in-depth done right. The irony is that the company that built this careful permission model forgot to strip a source map from their npm package.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;The moat for AI coding tools is not the CLI. It is the model. Anyone can read this source code and understand the architecture, but nobody can replicate Sonnet or Opus. The &lt;code&gt;queryLoop()&lt;/code&gt; pattern is elegant but simple — the magic is in what &lt;code&gt;callModel()&lt;/code&gt; returns. That said, the product roadmap is now public. Competitors know about Kairos, Ultraplan, multi-agent coordination, and voice mode. That is real strategic damage.&lt;/p&gt;

&lt;p&gt;For a company that positions itself as the responsible AI lab — the one that takes safety seriously — shipping a fully readable source map to a public registry is a notable operational security failure. The six-layer permission system in the code is impressive. The process that let a 60MB source map slip through CI/CD is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch the Deep Dive
&lt;/h2&gt;

&lt;p&gt;I broke down the full AI agent architecture — the same query loop that Claude Code uses — in a 15-minute video: &lt;a href="https://youtu.be/giNERYV-X7k" rel="noopener noreferrer"&gt;Watch on YouTube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For background on the "LLM talks, program walks" pattern: &lt;a href="https://harrisonsec.com/blog/ai-stack-explained-llm-talks-program-walks/" rel="noopener noreferrer"&gt;Read: The AI Stack Explained — LLM Talks, Program Walks&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Coming next: a deep dive into Claude Code's 6-layer permission system and the Kairos memory architecture — with full code walkthroughs. Subscribe to catch it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>anthropic</category>
      <category>agents</category>
      <category>security</category>
    </item>
  </channel>
</rss>
