<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ayush Kumar Anand</title>
    <description>The latest articles on Forem by Ayush Kumar Anand (@ayush-k-anand).</description>
    <link>https://forem.com/ayush-k-anand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3649778%2Fb4e0be58-16dc-4dee-bbfc-6e699432a743.jpeg</url>
      <title>Forem: Ayush Kumar Anand</title>
      <link>https://forem.com/ayush-k-anand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ayush-k-anand"/>
    <language>en</language>
    <item>
      <title>I Finally Understood Ford–Fulkerson by Solving Splitwise’s “Simplify Debts”</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sun, 18 Jan 2026 09:50:04 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/i-finally-understood-ford-fulkerson-by-solving-splitwises-simplify-debts-2dnp</link>
      <guid>https://forem.com/ayush-k-anand/i-finally-understood-ford-fulkerson-by-solving-splitwises-simplify-debts-2dnp</guid>
      <description>&lt;p&gt;For a long time, &lt;strong&gt;Ford–Fulkerson&lt;/strong&gt; felt like one of those DSA algorithms that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shows up in textbooks&lt;/li&gt;
&lt;li&gt;appears in interviews&lt;/li&gt;
&lt;li&gt;but never really shows up in real products&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I could recite:&lt;br&gt;
&lt;em&gt;“It finds the maximum flow in a graph”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But I never understood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what problem it truly solves&lt;/li&gt;
&lt;li&gt;why reverse edges exist&lt;/li&gt;
&lt;li&gt;why source and sink are even needed&lt;/li&gt;
&lt;li&gt;how this has anything to do with real apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changed when I tried to deeply understand &lt;strong&gt;Splitwise’s “Simplify Debts” feature&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article is written for developers who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;struggled with augmenting paths&lt;/li&gt;
&lt;li&gt;were confused by reverse edges&lt;/li&gt;
&lt;li&gt;didn’t get why “infinity” is used&lt;/li&gt;
&lt;li&gt;wondered why max-flow works for money problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll explain everything &lt;strong&gt;intuitively&lt;/strong&gt;, step by step.&lt;/p&gt;
&lt;h2&gt;
  
  
  The real problem: what does “simplify debt” actually mean?
&lt;/h2&gt;

&lt;p&gt;As far as I guess, Splitwise is not trying to preserve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who paid whom originally&lt;/li&gt;
&lt;li&gt;the sequence of transactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its only goal is:&lt;br&gt;
&lt;em&gt;Find the cleanest way to settle money so everyone ends up correct.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That’s it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Reduce everything to net balances (pure accounting)
&lt;/h2&gt;

&lt;p&gt;Before algorithms, we do accounting.&lt;br&gt;
For each person:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;net_balance = money_received − money_paid
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example after reduction:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Person&lt;/th&gt;
&lt;th&gt;NetBalance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;-50(owes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;-30(owes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;+40(receives)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;td&gt;+40(receives)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Interpretation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A must pay ₹50&lt;/li&gt;
&lt;li&gt;B must pay ₹30&lt;/li&gt;
&lt;li&gt;C must receive ₹40&lt;/li&gt;
&lt;li&gt;D must receive ₹40&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total owed = total received = &lt;strong&gt;₹80&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this point, I was stuck:&lt;br&gt;
&lt;em&gt;&lt;strong&gt;Who should pay whom?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The key insight (this is the click)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;_Debt simplification is routing money from people who owe to people who should receive, without creating or destroying money.&lt;br&gt;
_&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is &lt;em&gt;exactly&lt;/em&gt; what a &lt;strong&gt;flow network&lt;/strong&gt; models.&lt;/p&gt;

&lt;p&gt;Money is &lt;strong&gt;flow&lt;/strong&gt;.&lt;br&gt;
People are &lt;strong&gt;nodes&lt;/strong&gt;.&lt;br&gt;
Payments are &lt;strong&gt;edges&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2: Why we need a Source and a Sink
&lt;/h2&gt;

&lt;p&gt;This confused me the most at first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why a Source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We want a &lt;strong&gt;single&lt;/strong&gt; place that represents:&lt;br&gt;
&lt;em&gt;“All money that must be paid”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So we introduce a &lt;strong&gt;Source (S).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgib2bh899azqbi8801u2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgib2bh899azqbi8801u2.png" alt="1" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqmy71lc1o46h8s4xovd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqmy71lc1o46h8s4xovd.png" alt="2" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Meaning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A can send at most ₹50 into the system&lt;/li&gt;
&lt;li&gt;B can send at most ₹30&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why a Sink?
&lt;/h2&gt;

&lt;p&gt;We also want a &lt;strong&gt;single place&lt;/strong&gt; that represents:&lt;br&gt;
&lt;em&gt;“All money that must be received”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So we introduce a &lt;strong&gt;Sink (T)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If someone should receive money:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x8khqkb0u02ut62l6xv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x8khqkb0u02ut62l6xv.png" alt="3" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmupwhh9wdd8iyj9gdstt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmupwhh9wdd8iyj9gdstt.png" alt="4" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Meaning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;C can absorb at most ₹40&lt;/li&gt;
&lt;li&gt;D can absorb at most ₹40&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Critical mental model&lt;/strong&gt;&lt;br&gt;
_&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Source injects money.&lt;/li&gt;
&lt;li&gt;Sink absorbs money.&lt;/li&gt;
&lt;li&gt;Everyone else just routes it._&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a sink:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;there is no definition of “maximum”&lt;/li&gt;
&lt;li&gt;flow could circulate forever&lt;/li&gt;
&lt;li&gt;the problem has no goal&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Step 3: Connecting people (the “infinity” confusion)
&lt;/h2&gt;

&lt;p&gt;Now we connect &lt;strong&gt;debtors to creditors:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5xwb3rwn36aiuhzfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5xwb3rwn36aiuhzfw.png" alt="5" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is this “infinite”?&lt;/strong&gt;&lt;br&gt;
This does &lt;strong&gt;NOT&lt;/strong&gt; mean infinite money.&lt;/p&gt;

&lt;p&gt;It means:&lt;br&gt;
&lt;em&gt;Do not restrict who can pay whom.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The real limits already exist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A cannot pay more than ₹50 (from S → A)&lt;/li&gt;
&lt;li&gt;C cannot receive more than ₹40 (from C → T)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So “∞” just means:&lt;br&gt;
&lt;em&gt;Large enough to never be the bottleneck&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;∞ = total owed (or any sufficiently large number)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: What Ford–Fulkerson actually solves
&lt;/h2&gt;

&lt;p&gt;Ford–Fulkerson answers &lt;strong&gt;one question only&lt;/strong&gt;:&lt;br&gt;
&lt;em&gt;Any path from S to T where money can still flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;S → A → C → T&lt;/p&gt;

&lt;p&gt;Translated to English:&lt;br&gt;
&lt;em&gt;“Take money from the debt pool → A pays C → C receives it”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Each augmenting path is a &lt;strong&gt;valid payment.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step-by-step execution (no fear, no magic)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Augmenting Path 1&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;S → A → C → T&lt;/p&gt;

&lt;p&gt;Bottleneck:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;min(50, ∞, 40) = 40
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Action:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pays C ₹40&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remaining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A owes ₹10&lt;/li&gt;
&lt;li&gt;C is fully settled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Augmenting Path 2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;S → A → D → T&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;min(10, ∞, 40) = 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Action:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pays D ₹10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remaining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A is settled&lt;/li&gt;
&lt;li&gt;D still needs ₹30&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Augmenting Path 3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;S → B → D → T&lt;/p&gt;

&lt;p&gt;Bottleneck:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;min(30, ∞, 30) = 30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Action:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;B pays D ₹30&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: Reading the final answer (this is crucial)
&lt;/h2&gt;

&lt;p&gt;Ignore Source and Sink.&lt;br&gt;
Look only at &lt;strong&gt;person → person flows:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Payment&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A → C&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A → D&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B → D&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is the simplified debt list.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why reverse edges are inevitable?
&lt;/h2&gt;

&lt;p&gt;This was my biggest confusion.&lt;br&gt;
Reverse edges mean:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You are allowed to undo a previous payment if a better routing exists.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They &lt;strong&gt;do not&lt;/strong&gt; represent real payments.&lt;/p&gt;

&lt;p&gt;They represent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;refunds&lt;/li&gt;
&lt;li&gt;cancellations&lt;/li&gt;
&lt;li&gt;rerouting permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without reverse edges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;early decisions become permanent&lt;/li&gt;
&lt;li&gt;greedy choices can block better solutions&lt;/li&gt;
&lt;li&gt;max-flow can fail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real life, debt simplification &lt;strong&gt;must&lt;/strong&gt; allow:&lt;br&gt;
&lt;em&gt;“Let me undo that payment and send the money elsewhere instead.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Reverse edges make that possible.&lt;/p&gt;

&lt;p&gt;They are &lt;strong&gt;not an implementation trick&lt;/strong&gt; — they are &lt;strong&gt;mathematically required&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does “optimal” mean here?
&lt;/h2&gt;

&lt;p&gt;This is important.&lt;/p&gt;

&lt;p&gt;Optimal does &lt;strong&gt;NOT&lt;/strong&gt; mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fair to everyone&lt;/li&gt;
&lt;li&gt;equal distribution&lt;/li&gt;
&lt;li&gt;preserving original transactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optimal means:&lt;br&gt;
&lt;em&gt;The maximum possible amount of money reaches the sink without violating constraints.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;system-level optimality&lt;/strong&gt;, not individual optimality.&lt;/p&gt;

&lt;p&gt;Some people may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pay less than they “could”&lt;/li&gt;
&lt;li&gt;receive later&lt;/li&gt;
&lt;li&gt;be bypassed entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that’s perfectly correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why max-flow fits simplify debt perfectly
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Max-Flow Concept&lt;/th&gt;
&lt;th&gt;Real Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Flow&lt;/td&gt;
&lt;td&gt;Money&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node&lt;/td&gt;
&lt;td&gt;Person&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Capacity&lt;/td&gt;
&lt;td&gt;How much they owe / receive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source&lt;/td&gt;
&lt;td&gt;All debts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;All credits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conservation&lt;/td&gt;
&lt;td&gt;No money lost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reverse edge&lt;/td&gt;
&lt;td&gt;Undo bad routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is not a trick or coincidence.&lt;/p&gt;

&lt;p&gt;Debt simplification is &lt;strong&gt;inherently a routing problem.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;Ford–Fulkerson is not about pipes and water.&lt;/p&gt;

&lt;p&gt;It is about:&lt;br&gt;
&lt;em&gt;Routing a conserved quantity optimally under constraints.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Money is conserved.&lt;br&gt;
Debt simplification is routing.&lt;br&gt;
Max-flow fits naturally.&lt;/p&gt;

&lt;p&gt;Once I saw that, the algorithm finally made sense.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; Technically, minimizing the number of transactions is an NP-hard problem. Max-Flow ensures all money is settled correctly, but getting the absolute minimum number of edges usually requires additional greedy heuristics on top of this. For the purpose of this article, we are focusing on the flow conservation!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>The Engine Under the Hood: Go’s GMP, Java’s Locks, and Erlang’s Heaps</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sun, 11 Jan 2026 10:09:04 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/the-engine-under-the-hood-gos-gmp-javas-locks-and-erlangs-heaps-1lhi</link>
      <guid>https://forem.com/ayush-k-anand/the-engine-under-the-hood-gos-gmp-javas-locks-and-erlangs-heaps-1lhi</guid>
      <description>&lt;p&gt;As backend engineers, we often treat "concurrency" as a black box. We type go &lt;em&gt;func()&lt;/em&gt; or &lt;em&gt;spawn()&lt;/em&gt; and expect magic. But understanding how the runtime schedules these tasks is what separates a Senior Engineer from an Architect.&lt;br&gt;
This article dives into Go's &lt;strong&gt;GMP Scheduler&lt;/strong&gt;, explains the "secret agent" sysmon thread, and contrasts this model with &lt;strong&gt;Java’s Shared Memory&lt;/strong&gt; headaches and &lt;strong&gt;Erlang’s Gold Standard&lt;/strong&gt; isolation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Go’s GMP Architecture: The "Muxing" Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa32i53ur35k2i5wg1m9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa32i53ur35k2i5wg1m9e.png" alt="Golang Concurrency Example" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In traditional threading (like Java Pre-Loom), 1 Thread = 1 OS Thread. This is heavy (~1MB RAM per thread). You cannot spawn 100,000 of them without crashing.&lt;/p&gt;

&lt;p&gt;Go solves this with the &lt;strong&gt;GMP Model&lt;/strong&gt;, which multiplexes millions of Goroutines (user-space threads) onto a small number of Kernel Threads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Components&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;G (Goroutine):&lt;/strong&gt; This is your code. It is lightweight (starts at &lt;strong&gt;2KB&lt;/strong&gt; stack). It contains the instruction pointer (PC) and stack. It wants to run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;M (Machine):&lt;/strong&gt; This is the &lt;strong&gt;OS Thread&lt;/strong&gt;. It is expensive. The OS kernel manages this. It is the actual "worker" that executes CPU instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P (Processor):&lt;/strong&gt; This is a purely &lt;strong&gt;logical resource&lt;/strong&gt; (a "token"). It represents the context required to run Go code (local run queue, memory cache).
    - &lt;strong&gt;The Rule:&lt;/strong&gt; An &lt;strong&gt;M&lt;/strong&gt; must hold a &lt;strong&gt;P&lt;/strong&gt; to execute a &lt;strong&gt;G&lt;/strong&gt;.
    - &lt;strong&gt;P = Logical Cores:&lt;/strong&gt; By default, the number of Ps is set to &lt;em&gt;GOMAXPROCS&lt;/em&gt; (usually your CPU core count). This limits parallelism (simultaneous execution) while allowing infinite concurrency (managing overlapping tasks).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Lifecycle: When is a G vs. M Created?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;When is a G created?&lt;/strong&gt; Whenever you call &lt;em&gt;go func()&lt;/em&gt;. It is created in &lt;strong&gt;User Space&lt;/strong&gt; inside the Go runtime. It is cheap (~2KB) and goes into the &lt;strong&gt;Local Run Queue&lt;/strong&gt; of the current P.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When is an M created?&lt;/strong&gt; The runtime tries to keep M count low. However, it spawns a new M (OS Thread) when:&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;A Goroutine makes a &lt;strong&gt;Blocking System Call&lt;/strong&gt; (like CGO or complex File I/O) that cannot be handled asynchronously.&lt;/li&gt;
&lt;li&gt;The current M gets "stuck" inside the OS kernel.&lt;/li&gt;
&lt;li&gt;The runtime sees other Ps waiting to run Gs but has no M to serve them. This is expensive (~1-2MB).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Watcher: &lt;em&gt;sysmon&lt;/em&gt; and the &lt;em&gt;SIGURG&lt;/em&gt; Signal&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most misunderstood part of Go. How does the scheduler stop a Goroutine that has been running for too long (e.g., a &lt;em&gt;for {}&lt;/em&gt; loop)?&lt;/p&gt;

&lt;p&gt;Enter &lt;em&gt;sysmon&lt;/em&gt; (System Monitor).&lt;/p&gt;

&lt;p&gt;What is &lt;em&gt;sysmon&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;It is a special runtime thread that breaks the GMP rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It runs &lt;strong&gt;without a P&lt;/strong&gt; (no Processor token needed).&lt;/li&gt;
&lt;li&gt;It runs on a dedicated M.&lt;/li&gt;
&lt;li&gt;It wakes up periodically (20µs – 10ms).&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  The Mechanism: Asynchronous Preemption via &lt;em&gt;SIGURG&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;Since Go 1.14, Go uses &lt;strong&gt;signals&lt;/strong&gt; to force "work stealing" and fairness.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Trigger:&lt;/strong&gt; &lt;strong&gt;sysmon&lt;/strong&gt; scans all Ps. It sees that &lt;em&gt;Goroutine A&lt;/em&gt; has been running on Processor 1 for more than 10ms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Signal:&lt;/strong&gt; &lt;em&gt;sysmon&lt;/em&gt; sends a &lt;em&gt;SIGURG&lt;/em&gt; (Urgent Signal) to the Thread (M) running that Goroutine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why &lt;em&gt;SIGURG&lt;/em&gt;?&lt;/strong&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Out-of-Band:&lt;/strong&gt; It is designed for "Urgent Socket Data," which modern apps rarely use, so it doesn't conflict with user signals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Destructive:&lt;/strong&gt; Unlike &lt;em&gt;SIGINT&lt;/em&gt; (Ctrl+C), it doesn't kill the process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Libc Safe:&lt;/strong&gt; It doesn't interfere with C libraries mixed into Go (CGO).&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Interruption:&lt;/strong&gt; The OS interrupts the M. Go's signal handler injects a call to &lt;em&gt;asyncPreempt&lt;/em&gt; into the Goroutine's stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Yield:&lt;/strong&gt; The Goroutine pauses, is moved to the &lt;strong&gt;Global Run Queue&lt;/strong&gt;, and the P picks a new G to run.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Java: The Failure of "Shared Memory"
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa26kw2okfwia0w2fy45m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa26kw2okfwia0w2fy45m.png" alt="java concurrency" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Model:&lt;/strong&gt; "Communicate by Sharing Memory." All threads share the Same Heap. To pass data, they modify the same object.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Failure Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at the code below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Java: Explicit Locking (The Bottleneck)
class Counter {
    private int count = 0;

    // "synchronized" forces the OS to pause other threads (Context Switch)
    public synchronized void increment() {
        count++; 
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Race Conditions:&lt;/strong&gt; If you forget &lt;em&gt;synchronized&lt;/em&gt;, two threads write at once, and data is corrupted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance:&lt;/strong&gt; Locks require OS intervention. This is slow (thousands of cycles).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deadlocks:&lt;/strong&gt; Thread A holds Lock 1 waiting for Lock 2. Thread B holds Lock 2 waiting for Lock 1. The app freezes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Erlang: The Gold Standard (Per-Process Heaps)
&lt;/h2&gt;

&lt;p&gt;I have worked with erlang for over 3 years and I am convinced that it is one of the best languages that support concurrency out of the box, but suffers in cases where there is a lot of number crunching and loops, even though many of the library functions are written in &lt;strong&gt;C&lt;/strong&gt; and made available to erlang users through &lt;strong&gt;nifs&lt;/strong&gt; but with a caveat that &lt;strong&gt;unsafe NIFs&lt;/strong&gt; can block schedulers and break Erlang’s isolation guarantees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Model:&lt;/strong&gt; "Share Memory by Communicating." Every process has its &lt;strong&gt;Own Private Heap&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Erlang is "Better" (A Bank Example)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b64ygv4okpkvofwufx4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b64ygv4okpkvofwufx4.png" alt="erlang example" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Go, stacks are isolated, but heap pressure and GC are global, so runaway allocations can still impact tail latency. In Erlang, it is isolated.&lt;/p&gt;

&lt;p&gt;Look at the code snippet below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-module(bank_server).
-behaviour(gen_server).
%% ... exports ...

%% 1. The Safe Bank Process
init([]) -&amp;gt; {ok, 100}. %% Balance is $100

%% 2. The Dangerous Crash Process
trigger_crash() -&amp;gt;
    spawn(fun() -&amp;gt; 
        %% A. This allocates 1GB on a PRIVATE heap
        CrashList = lists:seq(1, 100000000), 
        %% B. Crashes immediately
        1 / 0 
    end).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Sequence:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Allocation:&lt;/strong&gt; The spawned process allocates 1GB. In Java/Go, this would fill the Global Heap and trigger a "Stop-The-World" Garbage Collection (GC) for everyone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Crash:&lt;/strong&gt; The process dies (divide by zero).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Cleanup:&lt;/strong&gt; The Erlang VM simply deletes the pointer to that private heap.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero GC Cost:&lt;/strong&gt; No need to scan memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Impact:&lt;/strong&gt; The bank_server (holding the $100) continues running with microsecond latency. It didn't even feel the crash.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Final takeaway:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Java's&lt;/strong&gt; shared-memory model places a heavier correctness burden on engineers, making large-scale concurrency harder to reason about.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Erlang&lt;/strong&gt; is the reliability king because Private Heaps prevent "noisy neighbors" from killing the system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go&lt;/strong&gt; is the pragmatic middle ground: It uses Shared Heaps for raw speed (no copying data) but uses CSP (Channels) to avoid the complexity of locks.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>backend</category>
      <category>erlang</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Why Goroutines Scale: Stack Growth, Compiler Tricks, and Context Switching</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sat, 03 Jan 2026 20:03:41 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/why-goroutines-scale-stack-growth-compiler-tricks-and-context-switching-k54</link>
      <guid>https://forem.com/ayush-k-anand/why-goroutines-scale-stack-growth-compiler-tricks-and-context-switching-k54</guid>
      <description>&lt;h2&gt;
  
  
  Threads in Languages like c++ and JAVA
&lt;/h2&gt;

&lt;p&gt;In the above mentioned languages, threads are a means of concurrency that takes a lot of cpu time in context switching and takes a relatively huge memory at the time of their creation. A single thread takes ~ 1 MB. Therefore, if you were to spawn 100,000 threads, you would need about 100 GB of RAM, which is not economically feasible for most of software projects. Also for maintaining concurrency, CPU generally uses timeslicing, to give equal number of cpu cycle to each thread. But while doing this, the CPU have to perform context switching. This whole thing is quite expensive, in terms of time because of saving current thread details in TCB(Process Control Block), loading the new thread TCB in memory and context switches destroy cache locality, causing frequent L1/L2 cache misses.&lt;/p&gt;

&lt;p&gt;As a result of this, when you have thousands of threads, your CPU spends more time switching context than actually executing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  How goroutines optimize this?
&lt;/h2&gt;

&lt;p&gt;Goroutines are "lightweight threads" managed entirely in &lt;strong&gt;User Space&lt;/strong&gt; by the Go Runtime, rather than the OS Kernel.&lt;/p&gt;

&lt;p&gt;The first massive optimization is memory. While a standard OS thread reserves a fixed 1 MB stack, a Goroutine initializes with a stack of just 2 KB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46onprwak88ce0yep82f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46onprwak88ce0yep82f.png" alt="Comparison showing huge 1MB OS Thread stack vs tiny 2KB Goroutine stack" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Math:&lt;/strong&gt; 2 KB is roughly 0.2% of 1 MB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Impact:&lt;/strong&gt; Instead of capping out at thousands of threads, you can easily spawn millions of Goroutines on a standard laptop without running out of RAM.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The "Infinite" Stack
&lt;/h2&gt;

&lt;p&gt;Unlike OS threads, which typically have a fixed stack size (e.g., 1 MB) determined at creation, Goroutines are dynamic. They start at 2 KB and grow automatically as needed.&lt;/p&gt;

&lt;p&gt;If a Goroutine runs out of space, the Go runtime allocates a larger segment of memory (usually double) and moves the stack there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8mtux9a0gyfmmd94mxw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8mtux9a0gyfmmd94mxw.png" alt="Flowchart showing how Go runtime allocates a larger stack and copies data when limit is hit" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OS Thread Limit:&lt;/strong&gt; Fixed (~1-8 MB). Hitting this causes a crash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goroutine Limit:&lt;/strong&gt; Dynamic (up to 1 GB on 64-bit systems).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means for all practical purposes, Goroutine recursion depth is limited only by available memory, while OS threads are limited by their initial reservation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Faster Context Switches
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpgkmat9mp5ih74ksl8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpgkmat9mp5ih74ksl8a.png" alt="Diagram illustrating Goroutines running in User Space vs OS Threads in Kernel Space" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just like OS threads, Goroutines need to save their state when paused so they can resume later.&lt;/p&gt;

&lt;p&gt;However, while an OS thread switch requires saving all CPU registers (including heavy floating-point registers) and trapping into Kernel Mode, a Goroutine switch is much cheaper.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OS Thread Switch:&lt;/strong&gt; ~1-2 microseconds. Saves huge state (AVX/SSE registers) to the &lt;strong&gt;TCB&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goroutine Switch:&lt;/strong&gt; ~200 nanoseconds (~10x faster). Saves only &lt;strong&gt;3 registers&lt;/strong&gt; (PC, SP, DX) to a simple Go struct called &lt;em&gt;g&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because this happens entirely in &lt;strong&gt;User Space&lt;/strong&gt;, the CPU stays hot, caches stay valid, and the overhead is negligible.&lt;/p&gt;

&lt;h2&gt;
  
  
  So how does goroutine allocate stack size dynamically?
&lt;/h2&gt;

&lt;p&gt;To achieve this, the Go compiler uses a technique called the Function Prologue.&lt;/p&gt;

&lt;p&gt;During compilation, the compiler inserts a few assembly instructions at the very start of every function.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Check:&lt;/strong&gt; These instructions compare the current Stack Pointer &lt;em&gt;(SP)&lt;/em&gt; against a limit called the &lt;strong&gt;Stack Guard&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Trigger:&lt;/strong&gt; If there isn't enough space for the function to run, it triggers a runtime function called &lt;em&gt;runtime.morestack&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Growth:&lt;/strong&gt; The runtime allocates a new, larger stack segment (usually 2x the size).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Copy &amp;amp; Fix:&lt;/strong&gt; It copies the user's data to the new stack. Crucially, it also &lt;strong&gt;adjusts all pointers&lt;/strong&gt; to ensure they point to the new addresses.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once this "surgery" is complete, the function resumes execution on the new, spacious stack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func main() {
    fmt.Println("Hello Ayush")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above is a sample go code.&lt;br&gt;
Now, when we run the command&lt;br&gt;
&lt;code&gt;&lt;br&gt;
go build -gcflags -S main.go&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;You will see a part&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main.main STEXT size=83 args=0x0 locals=0x40 funcid=0x0 align=0x0
    0x0000 00000 (/Users/ayushanand/concurrency/main.go:7)  TEXT    main.main(SB), ABIInternal, $64-0
    0x0000 00000 (/Users/ayushanand/concurrency/main.go:7)  CMPQ    SP, 16(R14)
    0x0004 00004 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-2
    0x0004 00004 (/Users/ayushanand/concurrency/main.go:7)  JLS 76
    0x0006 00006 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-1
    0x0006 00006 (/Users/ayushanand/concurrency/main.go:7)  PUSHQ   BP
    0x0007 00007 (/Users/ayushanand/concurrency/main.go:7)  MOVQ    SP, BP
    0x000a 00010 (/Users/ayushanand/concurrency/main.go:7)  SUBQ    $56, SP
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $0, gclocals·g5+hNtRBP6YXNjfog7aZjQ==(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $1, gclocals·EVwPOTmEGNnKe4zqm0ZbFQ==(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $2, main.main.stkobj(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:8)  LEAQ    type:string(SB), DX
    0x0015 00021 (/Users/ayushanand/concurrency/main.go:8)  MOVQ    DX, main..autotmp_8+40(SP)
    0x001a 00026 (/Users/ayushanand/concurrency/main.go:8)  LEAQ    main..stmp_0(SB), DX
    0x0021 00033 (/Users/ayushanand/concurrency/main.go:8)  MOVQ    DX, main..autotmp_8+48(SP)
    0x0026 00038 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ    os.Stdout(SB), BX
    0x002d 00045 (&amp;lt;unknown line number&amp;gt;)    NOP
    0x002d 00045 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ    go:itab.*os.File,io.Writer(SB), AX
    0x0034 00052 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ    main..autotmp_8+40(SP), CX
    0x0039 00057 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVL    $1, DI
    0x003e 00062 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ    DI, SI
    0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) PCDATA  $1, $0
    0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) CALL    fmt.Fprintln(SB)
    0x0046 00070 (/Users/ayushanand/concurrency/main.go:9)  ADDQ    $56, SP
    0x004a 00074 (/Users/ayushanand/concurrency/main.go:9)  POPQ    BP
    0x004b 00075 (/Users/ayushanand/concurrency/main.go:9)  RET
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:9)  NOP
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $1, $-1
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-2
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  CALL    runtime.morestack_noctxt(SB)
    0x0051 00081 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-1
    0x0051 00081 (/Users/ayushanand/concurrency/main.go:7)  JMP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see the assembly code that checks for stack size.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ending Notes:
&lt;/h2&gt;

&lt;p&gt;Goroutines aren't just "threads but smaller." They are a fundamental rethink of how we manage concurrency. By moving the stack management from the OS Kernel to the Go Runtime, we gain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Massive Scalability:&lt;/strong&gt; From 100k limit to millions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Memory:&lt;/strong&gt; Pay for what you use (2KB), not what you might use (1MB).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low Latency:&lt;/strong&gt; Context switches that are 10x faster.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next time you type go func(), remember: there is a tiny 2KB stack and a smart compiler working in the background to make it "infinite."&lt;/p&gt;

</description>
      <category>go</category>
      <category>computerscience</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>You Can't Resize a Bloom Filter. Here's What To Do Instead.</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sun, 28 Dec 2025 16:34:43 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/you-cant-resize-a-bloom-filter-heres-what-to-do-instead-4p34</link>
      <guid>https://forem.com/ayush-k-anand/you-cant-resize-a-bloom-filter-heres-what-to-do-instead-4p34</guid>
      <description>&lt;h2&gt;
  
  
  What are bloom filters?
&lt;/h2&gt;

&lt;p&gt;Bloom filters are a probabilistic data structure that uses bit arrays and hash functions to reduce the load on our main databases. While you might question why not use caching architectures like redis. The answer is decided by which tradeoff you are willing to choose - accuracy or memory. Redis(Basic Redis) like caching databases provide accuracy but they are not size efficient, whereas data structures like bloom filters works on an opposite spectrum.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do they work?
&lt;/h2&gt;

&lt;p&gt;Before working with bloom filters, which you might have decided after settling down with the accuracy tradeoff, maybe because you cannot cover the expenses of memory, you need to decide in advance the size of the bit array and the number of hash functions you are going to use. This is important because once a data is written and once the size is decided, the size of the bloom filter cannot be changed and the data, once written, cannot be deleted.&lt;/p&gt;

&lt;p&gt;First, let us ponder over the number of hash function. Now the number cannot be low because it will lead to accidental collisions. But at the same time the said number could not be large as it will fill the array faster and will also lead to accidental collisions.&lt;/p&gt;

&lt;p&gt;But luckily, we have a formula:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt; k = (m / n) × ln(2)&lt;/p&gt;

&lt;p&gt;where $m$ -&amp;gt; Size of our bit array (e.g., 100 bits).&lt;br&gt;
      $n$ = Number of items we expect to add (e.g., 10 users).&lt;/p&gt;

&lt;p&gt;Below is a golang implementation for the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "math"
)

// CalculateOptimalParams calculates 'm' (size of array) and 'k' (num hashes)
// based on how many items (n) you have and what error rate (p) you accept.
func CalculateOptimalParams(n int, p float64) (uint, uint) {
    // 1. Calculate ideal Bit Array Size (m)
    // Formula: m = - (n * ln(p)) / (ln(2)^2)
    m := - (float64(n) * math.Log(p)) / math.Pow(math.Log(2), 2)

    // 2. Calculate ideal Number of Hash Functions (k)
    // Formula: k = (m / n) * ln(2)
    k := (m / float64(n)) * math.Log(2)

    return uint(math.Ceil(m)), uint(math.Ceil(k))
}

func main() {
    n := 1000        // We expect 1,000 users
    p := 0.01        // We accept a 1% False Positive rate

    m, k := CalculateOptimalParams(n, p)

    // In your article, print this result!
    // It proves you aren't guessing.
    println("Recommended Bit Array Size (m):", m)
    println("Recommended Hash Functions (k):", k)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we come to the core logic. Bloom filters works by answering the question, does an entry of an item exists in data structure i.e. the bit array.&lt;/p&gt;

&lt;p&gt;Let us take an example.&lt;br&gt;
Suppose we are making an app like Splitwise. We have asked a user to choose a userName, but we have a constraint - the userName should be globally unique.&lt;/p&gt;

&lt;p&gt;Now, when a user chooses a name, we can run a sql query like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 1 FROM users WHERE username = 'ayush' LIMIT 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if the count is 1, we will ask the user to choose another name. But the query is proportional to log(n) where n is the size of the table in the best-case scenario, assuming there is an indexing on the username column.&lt;/p&gt;

&lt;p&gt;Conversely, we can use a bloom filter. We can just check in the filter, if the name is there. If the answer is a definite &lt;strong&gt;NO&lt;/strong&gt;, we can let the user keep his username, otherwise we can query the database and be sure if the name is already in the database and then proceed accordingly. You can see the huge effeciency we just acheived.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Maths Behind
&lt;/h2&gt;

&lt;p&gt;Bloom filters uses two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Bit Array:&lt;/strong&gt; An array of zeros and ones. (e.g., [0, 0, 0, 0, 0]).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hash Functions:&lt;/strong&gt; Math formulas that turn a word into a number.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Let's walk through it:&lt;/strong&gt; Imagine an empty array of 10 bits: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Adding "Ayush"&lt;/strong&gt; We run "Ayush" through 2 hash functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hash 1("Ayush") = &lt;strong&gt;2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Hash 2("Ayush") = &lt;strong&gt;7&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We flip the bits at index 2 and 7 to 1. Array: [0, 0, 1, 0, 0, 0, 0, 1, 0, 0]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Checking "Kumar"&lt;/strong&gt; We want to know if "Rahul" exists.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hash 1("Kumar") = &lt;strong&gt;3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Hash 2("Kumar") = &lt;strong&gt;4&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We check index 3 and 4. Are they &lt;strong&gt;1&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Index 3 is &lt;strong&gt;0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Verdict: Kumar is &lt;strong&gt;DEFINITELY NOT&lt;/strong&gt; in the set. (We didn't need to check the database!).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: The "False Positive" (The "Maybe")&lt;/strong&gt; Let's say we check "Anand".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hash 1("Anand") = &lt;strong&gt;2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Hash 2("Anand") = &lt;strong&gt;7&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We check the array. Index 2 is 1. Index 7 is 1. (Because "Ayush" set them earlier!).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag8hqxajxy223t2jm8zb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag8hqxajxy223t2jm8zb.png" alt="Bloom Filter Architecture, Collision, and Scalable Stacking Diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Verdict:&lt;/strong&gt; The filter says "Anand &lt;strong&gt;MAYBE&lt;/strong&gt; exists."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reality:&lt;/strong&gt; Anand doesn't exist. This is a collision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; Now (and only now), we check the real database to confirm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a golang implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "fmt"
    "hash/fnv"
)

// Defining the Bloom Filter
type BloomFilter struct {
    bitset []bool // The array of 0s and 1s
    size   int    // Size of the array
}

func NewBloomFilter(size int) *BloomFilter {
    return &amp;amp;BloomFilter{
        bitset: make([]bool, size),
        size:   size,
    }
}

// Hash Function 1
func (bf *BloomFilter) hash1(s string) int {
    h := fnv.New32a()
    h.Write([]byte(s))
    return int(h.Sum32()) % bf.size
}

// Hash Function 2 (Simulated by adding salt)
func (bf *BloomFilter) hash2(s string) int {
    h := fnv.New32a()
    h.Write([]byte(s + "salt")) // Add a simple salt to change the hash
    return int(h.Sum32()) % bf.size
}

// Add an item
func (bf *BloomFilter) Add(item string) {
    idx1 := bf.hash1(item)
    idx2 := bf.hash2(item)
    bf.bitset[idx1] = true
    bf.bitset[idx2] = true
    fmt.Printf("Added: %s (Indices: %d, %d)\n", item, idx1, idx2)
}

// Check if item exists
func (bf *BloomFilter) Exists(item string) bool {
    idx1 := bf.hash1(item)
    idx2 := bf.hash2(item)

    // If BOTH bits are true, it *might* exist
    if bf.bitset[idx1] &amp;amp;&amp;amp; bf.bitset[idx2] {
        return true
    }
    // If ANY bit is false, it definitely does NOT exist
    return false
}

func main() {
    filter := NewBloomFilter(100)

    // 1. Add Ayush
    filter.Add("Ayush")

    // 2. Check Ayush
    fmt.Println("Does Ayush exist?", filter.Exists("Ayush")) // True (Maybe)

    // 3. Check Kumar
    fmt.Println("Does Kumar exist?", filter.Exists("Kumar")) // False (Definitely Not)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Some real world examples:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Medium.com:&lt;/strong&gt; Uses Bloom Filters to avoid recommending articles you have already read.
&lt;strong&gt;Logic:&lt;/strong&gt; "Has Ayush read this ID?" If Filter says "No", show it. If "Maybe", check DB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Chrome:&lt;/strong&gt; Malicious URL checker.Instead of storing every bad URL in the browser (too big), they store a Bloom Filter.
If you visit a site, Chrome checks the filter. If it says "Maybe dangerous," it calls Google servers to confirm.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cassandra/Postgres (Databases):&lt;/strong&gt;Before searching a hard disk for a row, they check a Bloom Filter. If the row isn't there, they save a slow disk operation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to scale?
&lt;/h2&gt;

&lt;p&gt;So, sure we can decide the size of the array and of the items. But it is just an assumption, it will change in the future. So how to scale it. The scaling technology is implemented using &lt;strong&gt;stacking bloom fiters  with greater size aslo coined as Scalable Bloom Filters (SBF)&lt;/strong&gt;. This technique is used by the bloom filter provided by the &lt;strong&gt;Redis Stack&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, how does the stacking algo works?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A. The "Write" Rule (Adding Data)&lt;/strong&gt;&lt;br&gt;
We only write to the &lt;strong&gt;Newest (Active)&lt;/strong&gt; filter.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want to add "Ayush", you don't touch Filter 1 or Filter 2. You add him to Filter 3.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Why?&lt;/em&gt; Because Filters 1 and 2 are frozen.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;B. The "Read" Rule (Checking Data)&lt;/strong&gt;&lt;br&gt;
We must check ** ALL** filters, usually starting from the newest to the oldest.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Question:&lt;/strong&gt; "Does Ayush exist?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check Filter 3:&lt;/strong&gt; "No." (He isn't in the new hotel).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check Filter 2:&lt;/strong&gt; "No."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check Filter 1:&lt;/strong&gt; "Yes." (He checked in 2 months ago).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; True.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Logical OR:&lt;/strong&gt; Result = F3.Exists() || F2.Exists() || F1.Exists()&lt;br&gt;
&lt;strong&gt;Problems with the scaling approach....&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every time we add a &lt;strong&gt;new filter layer&lt;/strong&gt;, we introduce a &lt;strong&gt;New Source of Lies (False Positives)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If Filter 1 lies 1% of the time, and Filter 2 lies &lt;strong&gt;1%&lt;/strong&gt; of the time...&lt;/li&gt;
&lt;li&gt;Your total chance of being lied to is now roughly &lt;strong&gt;2%&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If you have 10 filters, your error rate becomes &lt;strong&gt;10%&lt;/strong&gt;. That is garbage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How Redis fixed it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They fixed it by making every new filter &lt;em&gt;stricter than the last one&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filter 1:&lt;/strong&gt; We allow &lt;strong&gt;0.1%&lt;/strong&gt; error.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter 2:&lt;/strong&gt; We allow &lt;strong&gt;0.01%&lt;/strong&gt; error. (10x stricter).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter 3:&lt;/strong&gt; We allow &lt;strong&gt;0.001%&lt;/strong&gt; error. (100x stricter).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hence even if our stack uses 20 filters, the total error rate stays tiny (under 1%) because the later filters are so precise.&lt;/p&gt;

</description>
      <category>redis</category>
      <category>systemdesign</category>
      <category>go</category>
      <category>performance</category>
    </item>
    <item>
      <title>Breaking the "Pattern": How We Built (and tried to scale) Strategic AI for Sweep</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sat, 20 Dec 2025 16:56:00 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/breaking-the-pattern-how-we-built-and-scaled-strategic-ai-for-sweep-1an3</link>
      <guid>https://forem.com/ayush-k-anand/breaking-the-pattern-how-we-built-and-scaled-strategic-ai-for-sweep-1an3</guid>
      <description>&lt;p&gt;In the world of online card games, players aren't just looking for a challenge; they’re looking for a soul.&lt;br&gt;
At my company, our game &lt;strong&gt;Sweep&lt;/strong&gt; had a long-standing "bot problem." Our computer opponents were built on a classic &lt;strong&gt;rule-based system&lt;/strong&gt;. While they followed the mechanics perfectly, they were fundamentally predictable. Experienced players quickly identified their repetitive move patterns, effectively turning a game of high-stakes strategy into a solved puzzle.&lt;br&gt;
We knew we had to evolve. We needed bots that could reason, bluff, and adapt like humans. This is the story of how we integrated locally hosted LLMs to build a "human-level" experience, and the hard infrastructure lessons we learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Proof of Concept: Beyond "If/Then" Logic
&lt;/h2&gt;

&lt;p&gt;Our goal was to move from static rules to a &lt;strong&gt;Deterministic AI system&lt;/strong&gt; that leveraged the pattern-matching power of Large Language Models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hybrid "Move-Index" Approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of letting the LLM generate raw text (which is slow and hard to parse), we used it as a &lt;strong&gt;Pattern-Matching Engine&lt;/strong&gt;. Here was our workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generation:&lt;/strong&gt; Our existing rule-based system would generate every possible valid move for the current game state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rating:&lt;/strong&gt; These moves were rated based on basic metrics (points, defense, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selection:&lt;/strong&gt; We fed the board state and the list of rated moves into the LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexing:&lt;/strong&gt; We asked the LLM to return only the index number of its chosen move.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This "index-only" response drastically reduced processing complexity and token overhead, allowing us to focus the AI's power solely on strategic choice rather than language generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure Realities: Paramaters and Performance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When we moved from theory to locally hosted models (using &lt;strong&gt;Ollama&lt;/strong&gt;), we hit our first major wall: the "Parameters vs. Latency" trade-off.&lt;/p&gt;

&lt;p&gt;At one point, we naively tried to route every bot move through the LLM. The GPU queue exploded, latency spiked, and we had to roll back the feature within hours.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small Models (Under 3B parameters):&lt;/strong&gt; These models were blazing fast, but they were "playing dumb." They missed obvious partner synergies and failed to recognize long-term threats.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large Models (Over 13B parameters):&lt;/strong&gt; These were strategic masters, but they were too slow. By the time they finished "reading" the prompt and processing the board state, the player experience had already stalled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To standardize our AI's "personality," we used &lt;strong&gt;Ollama Modelfiles&lt;/strong&gt; to save specific system prompts. This ensured every bot instance had the same strategic baseline without us having to re-send huge prompt blocks for every turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling with "Strategic Triage"
&lt;/h2&gt;

&lt;p&gt;We technically succeeded in creating human-level bots, but we couldn't scale. Our infrastructure simply couldn't handle thousands of bots trying to hit a local GPU at the same time.&lt;/p&gt;

&lt;p&gt;The solution was &lt;strong&gt;Strategic Triage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of giving every bot an LLM "brain" for every turn, we categorized moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The "Obvious" Tier:&lt;/strong&gt; Simple captures were still handled by the fast rule-based system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Critical" Tier:&lt;/strong&gt; In 4-player games where Partner Synergy was vital, we activated the LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Probabilistic Shedding:&lt;/strong&gt; We only used the LLM brain for a percentage of the total bot pool. This "triage" helped us scale while still making the overall bot population feel more intelligent and unpredictable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Humanizing the Bot: The Temperature Lever&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To prevent the bots from becoming too perfect (and thus, boring), we used the &lt;strong&gt;Temperature parameter&lt;/strong&gt; to simulate human mindsets.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low Temperature (0.1–0.3):&lt;/strong&gt; Created "Conservative" bots that played strictly by the book—perfect for professional-level rooms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Temperature (0.7–0.9):&lt;/strong&gt; Created "Risk-Takers." These bots made aggressive, sometimes "erroneous" moves that felt exactly like a human player trying a bold bluff.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: The Path Forward
&lt;/h2&gt;

&lt;p&gt;The PoC proved that LLMs can indeed break the repetitive patterns of rule-based bots. However, the infrastructure cost of locally hosting high-parameter models is the current frontier. By using &lt;strong&gt;Deterministic Triage&lt;/strong&gt; and &lt;strong&gt;Index-based responses&lt;/strong&gt;, we've found a middle ground: bots that play like people, without the server-melting overhead.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>architecture</category>
      <category>elixir</category>
    </item>
    <item>
      <title>I Wrote a WebSocket Client From Scratch and It Ate My RAM</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sun, 14 Dec 2025 14:45:57 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/i-wrote-a-websocket-client-from-scratch-and-it-ate-my-ram-5lb</link>
      <guid>https://forem.com/ayush-k-anand/i-wrote-a-websocket-client-from-scratch-and-it-ate-my-ram-5lb</guid>
      <description>&lt;p&gt;My company's production code runs on an old version of ejabberd 18, I needed to write a websocket client from scratch.&lt;br&gt;
I wrote the code, tested it and then it was deployed to production.&lt;br&gt;
But randomly, my whole ejabberd vm started crashing with &lt;em&gt;heap allocation error&lt;/em&gt;.&lt;br&gt;
This is the story of how I found a silent memory leak, how &lt;strong&gt;etop&lt;/strong&gt; became my best friend, and why unbounded buffers are the real villains in network programming.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧩 The Architecture (Why It Wasn’t Just a Loop)
&lt;/h2&gt;

&lt;p&gt;Our gaming backend needs hundreds–thousands of persistent WebSocket connections. So the structure was a classic Erlang supervision tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;owebsocket_sup&lt;/code&gt;&lt;/strong&gt; → root supervisor
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;connection_manager&lt;/code&gt;&lt;/strong&gt; → maintains the required number of connections and handles periodic reconnects, because infra layers (e.g., AWS load balancers) may terminate WebSocket sessions after a fixed lifetime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;owebsocket_client_sup&lt;/code&gt;&lt;/strong&gt; → dynamic supervisor
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;owebsocket_client&lt;/code&gt;&lt;/strong&gt; → one process per connection
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each worker is isolated. One bad connection shouldn't take down the system.&lt;/p&gt;

&lt;p&gt;In theory.&lt;/p&gt;


&lt;h2&gt;
  
  
  🕳️ The Bug That Hid in Plain Sight
&lt;/h2&gt;

&lt;p&gt;Here’s the heart of the problem — my frame parsing logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erlang"&gt;&lt;code&gt;&lt;span class="nf"&gt;retrieve_frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="nv"&gt;Buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="nl"&gt;#owebsocket_state.buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nv"&gt;UpdatedBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;Buffer&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="nl"&gt;#owebsocket_state&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;UpdatedBuffer&lt;/span&gt;&lt;span class="p"&gt;}.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks harmless, right?&lt;br&gt;
Append new data → check for complete frame → wait for more.&lt;br&gt;
But here’s the catch: TCP is a stream, not a message boundary.&lt;br&gt;
If the server: sent partial frames, or sent a huge frame, or sent junk my parser didn’t recognize …my code never hit the “complete frame” condition.&lt;br&gt;
It just kept &lt;strong&gt;appending&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And appending.&lt;br&gt;
And appending.&lt;/p&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 KB → fine&lt;/li&gt;
&lt;li&gt;50 KB → fine&lt;/li&gt;
&lt;li&gt;10 MB → suspicious&lt;/li&gt;
&lt;li&gt;300 MB → oh no&lt;/li&gt;
&lt;li&gt;16 GB → "OOM-killer enters the chat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Yes, it really hit ~16 GB. I checked etop twice because I thought the numbers were lying.)&lt;/p&gt;

&lt;p&gt;Large binaries in Erlang live off-heap.&lt;br&gt;
Once a process holds a reference to a giant binary, the VM cannot free it until that process dies.&lt;br&gt;
So one unlucky WebSocket client was quietly hoarding RAM like a dragon.&lt;/p&gt;
&lt;h2&gt;
  
  
  🔍 The Worst Part: It Was Random
&lt;/h2&gt;

&lt;p&gt;When debugging memory issues, consistency is your best friend.&lt;/p&gt;

&lt;p&gt;This bug had none.&lt;/p&gt;

&lt;p&gt;Some days it ran perfectly for 6 hours.&lt;br&gt;
Some days it crashed in 20 minutes.&lt;br&gt;
Some days it didn’t crash at all.&lt;/p&gt;

&lt;p&gt;I started suspecting TCP fragmentation patterns, upstream throttling, maybe even ghost data. It was the kind of randomness that makes you question your life choices.&lt;/p&gt;

&lt;p&gt;So I opened etop and watched memory usage per process and added logs to see the size of updated buffer before it carshed.&lt;/p&gt;

&lt;p&gt;At first: stable.&lt;br&gt;
Later: one client process growing linearly.&lt;br&gt;
Eventually: one worker at 700 MB alone.&lt;br&gt;
And in the log file the size was ~10GB.&lt;br&gt;
That’s when the bulb went off.&lt;/p&gt;

&lt;p&gt;The buffer never reset.&lt;/p&gt;
&lt;h2&gt;
  
  
  🛠️ The Fix: Never Trust the Network
&lt;/h2&gt;

&lt;p&gt;I introduced hard limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erlang"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt;&lt;span class="ni"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MAX_BUFFER_SIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt; &lt;span class="c"&gt;%% 5 MB
&lt;/span&gt;&lt;span class="p"&gt;-&lt;/span&gt;&lt;span class="ni"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;ERR_BUFFER_SIZE_LIMIT_EXCEEDED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffer_limit_exceeded&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And updated the logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erlang"&gt;&lt;code&gt;&lt;span class="nf"&gt;retrieve_frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="nv"&gt;Buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="nl"&gt;#owebsocket_state.buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;UpdatedBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;Buffer&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;UpdatedBuffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nv"&gt;MAX_BUFFER_SIZE&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nv"&gt;ERR_BUFFER_SIZE_LIMIT_EXCEEDED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;true&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="nl"&gt;#owebsocket_state&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;UpdatedBuffer&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then handled it safely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erlang"&gt;&lt;code&gt;&lt;span class="nf"&gt;handle_info&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="nv"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Socket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Bs&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;handle_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;Bs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt;
        &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nv"&gt;ERR_BUFFER_SIZE_LIMIT_EXCEEDED&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="nn"&gt;error_logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Buffer limit exceeded. Dropping connection."&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="c"&gt;%% Stop receiving data
&lt;/span&gt;            &lt;span class="nv"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;Socket&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="c"&gt;%% Reconnect with fresh state
&lt;/span&gt;            &lt;span class="nv"&gt;NewState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
                &lt;span class="nf"&gt;try_reconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer_limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                              &lt;span class="nv"&gt;State&lt;/span&gt;&lt;span class="nl"&gt;#owebsocket_state&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                                  &lt;span class="n"&gt;socket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;NewState&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="nv"&gt;NewState&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;NewState&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a client misbehaves → drop it.&lt;br&gt;
If a frame is too large → drop it.&lt;br&gt;
If random partial data confuses the parser → drop it.&lt;/p&gt;

&lt;p&gt;Better one connection dies than the whole VM.&lt;/p&gt;

&lt;h2&gt;
  
  
  📘 What I Learned
&lt;/h2&gt;

&lt;p&gt;Building protocols from scratch teaches you things libraries hide from you:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ Always enforce buffer size limits
&lt;/h3&gt;

&lt;p&gt;If you don’t, RAM will do it for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ Never assume input is reasonable
&lt;/h3&gt;

&lt;p&gt;Even if the spec says so.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ etop and logging are your friends
&lt;/h3&gt;

&lt;p&gt;It will show you exactly which process is misbehaving and error logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ Restart one process, not the VM
&lt;/h3&gt;

&lt;p&gt;That’s the whole point of Erlang.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ The best code is sometimes the one that says “nope.”
&lt;/h3&gt;

&lt;p&gt;Dropping a bad connection saved the entire system.&lt;/p&gt;




&lt;p&gt;After adding a 5 MB limit and a reconnection strategy, the system has been stable — no more OOM kills, no more ghost crashes, and no more 2 AM staring contests with &lt;code&gt;erl_crash.dump&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Sometimes reliability is not about writing more code.&lt;br&gt;&lt;br&gt;
It’s about knowing when to stop accepting input.&lt;/p&gt;

</description>
      <category>erlang</category>
      <category>systemdesign</category>
      <category>backend</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Illusion of isolation in Docker</title>
      <dc:creator>Ayush Kumar Anand</dc:creator>
      <pubDate>Sun, 07 Dec 2025 14:26:59 +0000</pubDate>
      <link>https://forem.com/ayush-k-anand/illusion-of-isolation-in-docker-4gd8</link>
      <guid>https://forem.com/ayush-k-anand/illusion-of-isolation-in-docker-4gd8</guid>
      <description>&lt;p&gt;In my company, I was recently given access to the &lt;code&gt;docker&lt;/code&gt; group to run containers. This sounded standard, but it led me to discover how I could gain &lt;strong&gt;functional root privileges&lt;/strong&gt; on the host machine just by being a member of this group.&lt;/p&gt;

&lt;p&gt;To understand how, let’s agree on a common Docker principle: &lt;strong&gt;The Docker Daemon runs as a root process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Docker CLI interacts with the daemon through a Unix socket (&lt;code&gt;/var/run/docker.sock&lt;/code&gt;). If you have access to the CLI (which the &lt;code&gt;docker&lt;/code&gt; group gives you), you have read and write access to this socket.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Exploit: Breaking the Isolation
&lt;/h2&gt;

&lt;p&gt;In general, Docker images run as isolated processes. But something "magical" happens when we run a container by mounting the host’s root file system.&lt;/p&gt;

&lt;p&gt;Consider this command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker run -v /:/host_root -it centos bash&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here, the container now has read/write access to the host system's root directory (/). By mounting it as a volume, we bypass the container's Union Filesystem (and lose the Copy-On-Write isolation) for that specific path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;strong&gt;chroot&lt;/strong&gt; trick
&lt;/h2&gt;

&lt;p&gt;At this stage, we have just mounted the file system. If I try to install a service &lt;strong&gt;(e.g., yum install tree)&lt;/strong&gt; inside the container normally, it installs in the container's &lt;strong&gt;/usr/bin&lt;/strong&gt;, not the host's.&lt;/p&gt;

&lt;p&gt;But, suppose I run this command inside the container:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chroot /host_root&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The chroot (Change Root) command changes the apparent root directory for the current running process.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I am telling my process: "Treat the mounted /host_root as your actual / directory."&lt;/li&gt;
&lt;li&gt;The root user in a container usually has a UID of 0. &lt;/li&gt;
&lt;li&gt;The Linux Kernel allows UID 0 to do whatever it wants.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point, even though I’m still inside a container, I’m operating directly on the host’s filesystem with root-level permissions — which is just as dangerous in practice.&lt;/p&gt;

&lt;p&gt;If I install a package now, it installs on the host. If I delete a file, it is deleted from the host.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to manage this?
&lt;/h2&gt;

&lt;p&gt;Since granting &lt;strong&gt;docker&lt;/strong&gt; group access is essentially granting &lt;strong&gt;root&lt;/strong&gt; access, here is how we can secure it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rootless Docker:&lt;/strong&gt; Run the Docker daemon as a non-root service. However, rootless Docker comes with trade-offs: reduced performance and limitations around direct access to storage and the networking stack. So it has to add some wayaround, that incurs additional overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-Only Mounts:&lt;/strong&gt; If you must mount the host filesystem, enforce read-only access: docker run -v /:/host_root:ro ...&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Principle of Least Privilege:&lt;/strong&gt; Try adding minimal users to the docker user.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure Dockerfiles:&lt;/strong&gt; When creating images, explicitly create a non-root user. 
&lt;em&gt;Example of a secure user setup:&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;RUN groupadd -r container_user &amp;amp;&amp;amp; useradd -r -g container_user     container_user&lt;br&gt;
RUN chown -R container_user:container_user /host_root&lt;br&gt;
USER container_user&lt;/code&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>linux</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
