Forem: Ayush Kumar Anand

I Finally Understood Ford–Fulkerson by Solving Splitwise’s “Simplify Debts”

Ayush Kumar Anand — Sun, 18 Jan 2026 09:50:04 +0000

For a long time, Ford–Fulkerson felt like one of those DSA algorithms that:

shows up in textbooks
appears in interviews
but never really shows up in real products

I could recite:
“It finds the maximum flow in a graph”

But I never understood:

what problem it truly solves
why reverse edges exist
why source and sink are even needed
how this has anything to do with real apps

That changed when I tried to deeply understand Splitwise’s “Simplify Debts” feature.

This article is written for developers who:

struggled with augmenting paths
were confused by reverse edges
didn’t get why “infinity” is used
wondered why max-flow works for money problems

I’ll explain everything intuitively, step by step.

The real problem: what does “simplify debt” actually mean?

As far as I guess, Splitwise is not trying to preserve:

who paid whom originally
the sequence of transactions

Its only goal is:
Find the cleanest way to settle money so everyone ends up correct.

That’s it.

Step 1: Reduce everything to net balances (pure accounting)

Before algorithms, we do accounting.
For each person:

net_balance = money_received − money_paid

Example after reduction:

Person	NetBalance
A	-50(owes)
B	-30(owes)
C	+40(receives)
D	+40(receives)

Interpretation:

A must pay ₹50
B must pay ₹30
C must receive ₹40
D must receive ₹40

Total owed = total received = ₹80

At this point, I was stuck:
Who should pay whom?

The key insight (this is the click)

_Debt simplification is routing money from people who owe to people who should receive, without creating or destroying money.
_

That is exactly what a flow network models.

Money is flow.
People are nodes.
Payments are edges.

Step 2: Why we need a Source and a Sink

This confused me the most at first.

Why a Source?

We want a single place that represents:
“All money that must be paid”

So we introduce a Source (S).

Example:

Meaning:

A can send at most ₹50 into the system
B can send at most ₹30

Why a Sink?

We also want a single place that represents:
“All money that must be received”

So we introduce a Sink (T).

If someone should receive money:

Example:

Meaning:

C can absorb at most ₹40
D can absorb at most ₹40

Critical mental model
_

Source injects money.
Sink absorbs money.
Everyone else just routes it._

Without a sink:

there is no definition of “maximum”
flow could circulate forever
the problem has no goal

Step 3: Connecting people (the “infinity” confusion)

Now we connect debtors to creditors:

Why is this “infinite”?
This does NOT mean infinite money.

It means:
Do not restrict who can pay whom.

The real limits already exist:

A cannot pay more than ₹50 (from S → A)
C cannot receive more than ₹40 (from C → T)

So “∞” just means:
Large enough to never be the bottleneck

In practice:

∞ = total owed (or any sufficiently large number)

Step 4: What Ford–Fulkerson actually solves

Ford–Fulkerson answers one question only:
Any path from S to T where money can still flow.

Example:

S → A → C → T

Translated to English:
“Take money from the debt pool → A pays C → C receives it”

Each augmenting path is a valid payment.

Step-by-step execution (no fear, no magic)

Augmenting Path 1

S → A → C → T

Bottleneck:

min(50, ∞, 40) = 40

Action:

A pays C ₹40

Remaining:

A owes ₹10
C is fully settled

Augmenting Path 2

S → A → D → T

min(10, ∞, 40) = 10

Action:

A pays D ₹10

Remaining:

A is settled
D still needs ₹30

Augmenting Path 3

S → B → D → T

Bottleneck:

min(30, ∞, 30) = 30

Action:

B pays D ₹30

Step 5: Reading the final answer (this is crucial)

Ignore Source and Sink.
Look only at person → person flows:

Payment	Amount
A → C	40
A → D	10
B → D	30

That is the simplified debt list.

Why reverse edges are inevitable?

This was my biggest confusion.
Reverse edges mean:

You are allowed to undo a previous payment if a better routing exists.

They do not represent real payments.

They represent:

refunds
cancellations
rerouting permissions

Without reverse edges:

early decisions become permanent
greedy choices can block better solutions
max-flow can fail

In real life, debt simplification must allow:
“Let me undo that payment and send the money elsewhere instead.”

Reverse edges make that possible.

They are not an implementation trick — they are mathematically required.

What does “optimal” mean here?

This is important.

Optimal does NOT mean:

fair to everyone
equal distribution
preserving original transactions

Optimal means:
The maximum possible amount of money reaches the sink without violating constraints.

This is system-level optimality, not individual optimality.

Some people may:

pay less than they “could”
receive later
be bypassed entirely

And that’s perfectly correct.

Why max-flow fits simplify debt perfectly

Max-Flow Concept	Real Meaning
Flow	Money
Node	Person
Capacity	How much they owe / receive
Source	All debts
Sink	All credits
Conservation	No money lost
Reverse edge	Undo bad routing

This is not a trick or coincidence.

Debt simplification is inherently a routing problem.

Final takeaway

Ford–Fulkerson is not about pipes and water.

It is about:
Routing a conserved quantity optimally under constraints.

Money is conserved.
Debt simplification is routing.
Max-flow fits naturally.

Once I saw that, the algorithm finally made sense.

Note: Technically, minimizing the number of transactions is an NP-hard problem. Max-Flow ensures all money is settled correctly, but getting the absolute minimum number of edges usually requires additional greedy heuristics on top of this. For the purpose of this article, we are focusing on the flow conservation!

The Engine Under the Hood: Go’s GMP, Java’s Locks, and Erlang’s Heaps

Ayush Kumar Anand — Sun, 11 Jan 2026 10:09:04 +0000

As backend engineers, we often treat "concurrency" as a black box. We type go func() or spawn() and expect magic. But understanding how the runtime schedules these tasks is what separates a Senior Engineer from an Architect.
This article dives into Go's GMP Scheduler, explains the "secret agent" sysmon thread, and contrasts this model with Java’s Shared Memory headaches and Erlang’s Gold Standard isolation.

Go’s GMP Architecture: The "Muxing" Strategy

In traditional threading (like Java Pre-Loom), 1 Thread = 1 OS Thread. This is heavy (~1MB RAM per thread). You cannot spawn 100,000 of them without crashing.

Go solves this with the GMP Model, which multiplexes millions of Goroutines (user-space threads) onto a small number of Kernel Threads.

The Components

G (Goroutine): This is your code. It is lightweight (starts at 2KB stack). It contains the instruction pointer (PC) and stack. It wants to run.
M (Machine): This is the OS Thread. It is expensive. The OS kernel manages this. It is the actual "worker" that executes CPU instructions.
P (Processor): This is a purely logical resource (a "token"). It represents the context required to run Go code (local run queue, memory cache). - The Rule: An M must hold a P to execute a G. - P = Logical Cores: By default, the number of Ps is set to GOMAXPROCS (usually your CPU core count). This limits parallelism (simultaneous execution) while allowing infinite concurrency (managing overlapping tasks).

The Lifecycle: When is a G vs. M Created?

When is a G created? Whenever you call go func(). It is created in User Space inside the Go runtime. It is cheap (~2KB) and goes into the Local Run Queue of the current P.
When is an M created? The runtime tries to keep M count low. However, it spawns a new M (OS Thread) when:

A Goroutine makes a Blocking System Call (like CGO or complex File I/O) that cannot be handled asynchronously.
The current M gets "stuck" inside the OS kernel.
The runtime sees other Ps waiting to run Gs but has no M to serve them. This is expensive (~1-2MB).

The Watcher: sysmon and the SIGURG Signal

This is the most misunderstood part of Go. How does the scheduler stop a Goroutine that has been running for too long (e.g., a for {} loop)?

Enter sysmon (System Monitor).

What is sysmon?

It is a special runtime thread that breaks the GMP rules:

It runs without a P (no Processor token needed).
It runs on a dedicated M.
It wakes up periodically (20µs – 10ms).

The Mechanism: Asynchronous Preemption via SIGURG

Since Go 1.14, Go uses signals to force "work stealing" and fairness.

The Trigger: sysmon scans all Ps. It sees that Goroutine A has been running on Processor 1 for more than 10ms.
The Signal: sysmon sends a SIGURG (Urgent Signal) to the Thread (M) running that Goroutine.
Why SIGURG?
1. Out-of-Band: It is designed for "Urgent Socket Data," which modern apps rarely use, so it doesn't conflict with user signals.
2. Non-Destructive: Unlike SIGINT (Ctrl+C), it doesn't kill the process.
3. Libc Safe: It doesn't interfere with C libraries mixed into Go (CGO).
The Interruption: The OS interrupts the M. Go's signal handler injects a call to asyncPreempt into the Goroutine's stack.
The Yield: The Goroutine pauses, is moved to the Global Run Queue, and the P picks a new G to run.

Java: The Failure of "Shared Memory"

The Model: "Communicate by Sharing Memory." All threads share the Same Heap. To pass data, they modify the same object.

The Failure Mode

Look at the code below

// Java: Explicit Locking (The Bottleneck)
class Counter {
    private int count = 0;

    // "synchronized" forces the OS to pause other threads (Context Switch)
    public synchronized void increment() {
        count++; 
    }
}

Race Conditions: If you forget synchronized, two threads write at once, and data is corrupted.
Performance: Locks require OS intervention. This is slow (thousands of cycles).
Deadlocks: Thread A holds Lock 1 waiting for Lock 2. Thread B holds Lock 2 waiting for Lock 1. The app freezes.

Erlang: The Gold Standard (Per-Process Heaps)

I have worked with erlang for over 3 years and I am convinced that it is one of the best languages that support concurrency out of the box, but suffers in cases where there is a lot of number crunching and loops, even though many of the library functions are written in C and made available to erlang users through nifs but with a caveat that unsafe NIFs can block schedulers and break Erlang’s isolation guarantees.

The Model: "Share Memory by Communicating." Every process has its Own Private Heap.

Why Erlang is "Better" (A Bank Example)

In Go, stacks are isolated, but heap pressure and GC are global, so runaway allocations can still impact tail latency. In Erlang, it is isolated.

Look at the code snippet below.

-module(bank_server).
-behaviour(gen_server).
%% ... exports ...

%% 1. The Safe Bank Process
init([]) -> {ok, 100}. %% Balance is $100

%% 2. The Dangerous Crash Process
trigger_crash() ->
    spawn(fun() -> 
        %% A. This allocates 1GB on a PRIVATE heap
        CrashList = lists:seq(1, 100000000), 
        %% B. Crashes immediately
        1 / 0 
    end).

The Sequence:

Allocation: The spawned process allocates 1GB. In Java/Go, this would fill the Global Heap and trigger a "Stop-The-World" Garbage Collection (GC) for everyone.
The Crash: The process dies (divide by zero).
The Cleanup: The Erlang VM simply deletes the pointer to that private heap.
- Zero GC Cost: No need to scan memory.
- Zero Impact: The bank_server (holding the $100) continues running with microsecond latency. It didn't even feel the crash.

Final takeaway:

Java's shared-memory model places a heavier correctness burden on engineers, making large-scale concurrency harder to reason about.
Erlang is the reliability king because Private Heaps prevent "noisy neighbors" from killing the system.
Go is the pragmatic middle ground: It uses Shared Heaps for raw speed (no copying data) but uses CSP (Channels) to avoid the complexity of locks.

Why Goroutines Scale: Stack Growth, Compiler Tricks, and Context Switching

Ayush Kumar Anand — Sat, 03 Jan 2026 20:03:41 +0000

Threads in Languages like c++ and JAVA

In the above mentioned languages, threads are a means of concurrency that takes a lot of cpu time in context switching and takes a relatively huge memory at the time of their creation. A single thread takes ~ 1 MB. Therefore, if you were to spawn 100,000 threads, you would need about 100 GB of RAM, which is not economically feasible for most of software projects. Also for maintaining concurrency, CPU generally uses timeslicing, to give equal number of cpu cycle to each thread. But while doing this, the CPU have to perform context switching. This whole thing is quite expensive, in terms of time because of saving current thread details in TCB(Process Control Block), loading the new thread TCB in memory and context switches destroy cache locality, causing frequent L1/L2 cache misses.

As a result of this, when you have thousands of threads, your CPU spends more time switching context than actually executing code.

How goroutines optimize this?

Goroutines are "lightweight threads" managed entirely in User Space by the Go Runtime, rather than the OS Kernel.

The first massive optimization is memory. While a standard OS thread reserves a fixed 1 MB stack, a Goroutine initializes with a stack of just 2 KB.

The Math: 2 KB is roughly 0.2% of 1 MB.
The Impact: Instead of capping out at thousands of threads, you can easily spawn millions of Goroutines on a standard laptop without running out of RAM.

The "Infinite" Stack

Unlike OS threads, which typically have a fixed stack size (e.g., 1 MB) determined at creation, Goroutines are dynamic. They start at 2 KB and grow automatically as needed.

If a Goroutine runs out of space, the Go runtime allocates a larger segment of memory (usually double) and moves the stack there.

OS Thread Limit: Fixed (~1-8 MB). Hitting this causes a crash.
Goroutine Limit: Dynamic (up to 1 GB on 64-bit systems).

This means for all practical purposes, Goroutine recursion depth is limited only by available memory, while OS threads are limited by their initial reservation.

Faster Context Switches

Just like OS threads, Goroutines need to save their state when paused so they can resume later.

However, while an OS thread switch requires saving all CPU registers (including heavy floating-point registers) and trapping into Kernel Mode, a Goroutine switch is much cheaper.

OS Thread Switch: ~1-2 microseconds. Saves huge state (AVX/SSE registers) to the TCB.
Goroutine Switch: ~200 nanoseconds (~10x faster). Saves only 3 registers (PC, SP, DX) to a simple Go struct called g.

Because this happens entirely in User Space, the CPU stays hot, caches stay valid, and the overhead is negligible.

So how does goroutine allocate stack size dynamically?

To achieve this, the Go compiler uses a technique called the Function Prologue.

During compilation, the compiler inserts a few assembly instructions at the very start of every function.

The Check: These instructions compare the current Stack Pointer (SP) against a limit called the Stack Guard.
The Trigger: If there isn't enough space for the function to run, it triggers a runtime function called runtime.morestack.
The Growth: The runtime allocates a new, larger stack segment (usually 2x the size).
The Copy & Fix: It copies the user's data to the new stack. Crucially, it also adjusts all pointers to ensure they point to the new addresses.

Once this "surgery" is complete, the function resumes execution on the new, spacious stack.

func main() {
    fmt.Println("Hello Ayush")
}

Above is a sample go code.
Now, when we run the command
go build -gcflags -S main.go

Summary

You will see a part

main.main STEXT size=83 args=0x0 locals=0x40 funcid=0x0 align=0x0
    0x0000 00000 (/Users/ayushanand/concurrency/main.go:7)  TEXT    main.main(SB), ABIInternal, $64-0
    0x0000 00000 (/Users/ayushanand/concurrency/main.go:7)  CMPQ    SP, 16(R14)
    0x0004 00004 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-2
    0x0004 00004 (/Users/ayushanand/concurrency/main.go:7)  JLS 76
    0x0006 00006 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-1
    0x0006 00006 (/Users/ayushanand/concurrency/main.go:7)  PUSHQ   BP
    0x0007 00007 (/Users/ayushanand/concurrency/main.go:7)  MOVQ    SP, BP
    0x000a 00010 (/Users/ayushanand/concurrency/main.go:7)  SUBQ    $56, SP
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $0, gclocals·g5+hNtRBP6YXNjfog7aZjQ==(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $1, gclocals·EVwPOTmEGNnKe4zqm0ZbFQ==(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:7)  FUNCDATA    $2, main.main.stkobj(SB)
    0x000e 00014 (/Users/ayushanand/concurrency/main.go:8)  LEAQ    type:string(SB), DX
    0x0015 00021 (/Users/ayushanand/concurrency/main.go:8)  MOVQ    DX, main..autotmp_8+40(SP)
    0x001a 00026 (/Users/ayushanand/concurrency/main.go:8)  LEAQ    main..stmp_0(SB), DX
    0x0021 00033 (/Users/ayushanand/concurrency/main.go:8)  MOVQ    DX, main..autotmp_8+48(SP)
    0x0026 00038 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ    os.Stdout(SB), BX
    0x002d 00045 (<unknown line number>)    NOP
    0x002d 00045 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ    go:itab.*os.File,io.Writer(SB), AX
    0x0034 00052 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ    main..autotmp_8+40(SP), CX
    0x0039 00057 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVL    $1, DI
    0x003e 00062 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ    DI, SI
    0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) PCDATA  $1, $0
    0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) CALL    fmt.Fprintln(SB)
    0x0046 00070 (/Users/ayushanand/concurrency/main.go:9)  ADDQ    $56, SP
    0x004a 00074 (/Users/ayushanand/concurrency/main.go:9)  POPQ    BP
    0x004b 00075 (/Users/ayushanand/concurrency/main.go:9)  RET
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:9)  NOP
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $1, $-1
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-2
    0x004c 00076 (/Users/ayushanand/concurrency/main.go:7)  CALL    runtime.morestack_noctxt(SB)
    0x0051 00081 (/Users/ayushanand/concurrency/main.go:7)  PCDATA  $0, $-1
    0x0051 00081 (/Users/ayushanand/concurrency/main.go:7)  JMP

You can see the assembly code that checks for stack size.

Ending Notes:

Goroutines aren't just "threads but smaller." They are a fundamental rethink of how we manage concurrency. By moving the stack management from the OS Kernel to the Go Runtime, we gain:

Massive Scalability: From 100k limit to millions.
Dynamic Memory: Pay for what you use (2KB), not what you might use (1MB).
Low Latency: Context switches that are 10x faster.

Next time you type go func(), remember: there is a tiny 2KB stack and a smart compiler working in the background to make it "infinite."

You Can't Resize a Bloom Filter. Here's What To Do Instead.

Ayush Kumar Anand — Sun, 28 Dec 2025 16:34:43 +0000

What are bloom filters?

Bloom filters are a probabilistic data structure that uses bit arrays and hash functions to reduce the load on our main databases. While you might question why not use caching architectures like redis. The answer is decided by which tradeoff you are willing to choose - accuracy or memory. Redis(Basic Redis) like caching databases provide accuracy but they are not size efficient, whereas data structures like bloom filters works on an opposite spectrum.

How do they work?

Before working with bloom filters, which you might have decided after settling down with the accuracy tradeoff, maybe because you cannot cover the expenses of memory, you need to decide in advance the size of the bit array and the number of hash functions you are going to use. This is important because once a data is written and once the size is decided, the size of the bloom filter cannot be changed and the data, once written, cannot be deleted.

First, let us ponder over the number of hash function. Now the number cannot be low because it will lead to accidental collisions. But at the same time the said number could not be large as it will fill the array faster and will also lead to accidental collisions.

But luckily, we have a formula:

Formula: k = (m / n) × ln(2)

where $m$ -> Size of our bit array (e.g., 100 bits).
$n$ = Number of items we expect to add (e.g., 10 users).

Below is a golang implementation for the same:

package main

import (
    "math"
)

// CalculateOptimalParams calculates 'm' (size of array) and 'k' (num hashes)
// based on how many items (n) you have and what error rate (p) you accept.
func CalculateOptimalParams(n int, p float64) (uint, uint) {
    // 1. Calculate ideal Bit Array Size (m)
    // Formula: m = - (n * ln(p)) / (ln(2)^2)
    m := - (float64(n) * math.Log(p)) / math.Pow(math.Log(2), 2)

    // 2. Calculate ideal Number of Hash Functions (k)
    // Formula: k = (m / n) * ln(2)
    k := (m / float64(n)) * math.Log(2)

    return uint(math.Ceil(m)), uint(math.Ceil(k))
}

func main() {
    n := 1000        // We expect 1,000 users
    p := 0.01        // We accept a 1% False Positive rate

    m, k := CalculateOptimalParams(n, p)

    // In your article, print this result!
    // It proves you aren't guessing.
    println("Recommended Bit Array Size (m):", m)
    println("Recommended Hash Functions (k):", k)
}

Now, we come to the core logic. Bloom filters works by answering the question, does an entry of an item exists in data structure i.e. the bit array.

Let us take an example.
Suppose we are making an app like Splitwise. We have asked a user to choose a userName, but we have a constraint - the userName should be globally unique.

Now, when a user chooses a name, we can run a sql query like

SELECT 1 FROM users WHERE username = 'ayush' LIMIT 1

Now, if the count is 1, we will ask the user to choose another name. But the query is proportional to log(n) where n is the size of the table in the best-case scenario, assuming there is an indexing on the username column.

Conversely, we can use a bloom filter. We can just check in the filter, if the name is there. If the answer is a definite NO, we can let the user keep his username, otherwise we can query the database and be sure if the name is already in the database and then proceed accordingly. You can see the huge effeciency we just acheived.

The Maths Behind

Bloom filters uses two things:

A Bit Array: An array of zeros and ones. (e.g., [0, 0, 0, 0, 0]).
Hash Functions: Math formulas that turn a word into a number.

Let's walk through it: Imagine an empty array of 10 bits: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Step 1: Adding "Ayush" We run "Ayush" through 2 hash functions:

Hash 1("Ayush") = 2
Hash 2("Ayush") = 7

We flip the bits at index 2 and 7 to 1. Array: [0, 0, 1, 0, 0, 0, 0, 1, 0, 0]

Step 2: Checking "Kumar" We want to know if "Rahul" exists.

Hash 1("Kumar") = 3
Hash 2("Kumar") = 4

We check index 3 and 4. Are they 1?

Index 3 is 0.

Verdict: Kumar is DEFINITELY NOT in the set. (We didn't need to check the database!).

Step 3: The "False Positive" (The "Maybe") Let's say we check "Anand".

Hash 1("Anand") = 2
Hash 2("Anand") = 7

We check the array. Index 2 is 1. Index 7 is 1. (Because "Ayush" set them earlier!).

Verdict: The filter says "Anand MAYBE exists."
Reality: Anand doesn't exist. This is a collision.
Action: Now (and only now), we check the real database to confirm.

Below is a golang implementation:

package main

import (
    "fmt"
    "hash/fnv"
)

// Defining the Bloom Filter
type BloomFilter struct {
    bitset []bool // The array of 0s and 1s
    size   int    // Size of the array
}

func NewBloomFilter(size int) *BloomFilter {
    return &BloomFilter{
        bitset: make([]bool, size),
        size:   size,
    }
}

// Hash Function 1
func (bf *BloomFilter) hash1(s string) int {
    h := fnv.New32a()
    h.Write([]byte(s))
    return int(h.Sum32()) % bf.size
}

// Hash Function 2 (Simulated by adding salt)
func (bf *BloomFilter) hash2(s string) int {
    h := fnv.New32a()
    h.Write([]byte(s + "salt")) // Add a simple salt to change the hash
    return int(h.Sum32()) % bf.size
}

// Add an item
func (bf *BloomFilter) Add(item string) {
    idx1 := bf.hash1(item)
    idx2 := bf.hash2(item)
    bf.bitset[idx1] = true
    bf.bitset[idx2] = true
    fmt.Printf("Added: %s (Indices: %d, %d)\n", item, idx1, idx2)
}

// Check if item exists
func (bf *BloomFilter) Exists(item string) bool {
    idx1 := bf.hash1(item)
    idx2 := bf.hash2(item)

    // If BOTH bits are true, it *might* exist
    if bf.bitset[idx1] && bf.bitset[idx2] {
        return true
    }
    // If ANY bit is false, it definitely does NOT exist
    return false
}

func main() {
    filter := NewBloomFilter(100)

    // 1. Add Ayush
    filter.Add("Ayush")

    // 2. Check Ayush
    fmt.Println("Does Ayush exist?", filter.Exists("Ayush")) // True (Maybe)

    // 3. Check Kumar
    fmt.Println("Does Kumar exist?", filter.Exists("Kumar")) // False (Definitely Not)
}

Some real world examples:

Medium.com: Uses Bloom Filters to avoid recommending articles you have already read. Logic: "Has Ayush read this ID?" If Filter says "No", show it. If "Maybe", check DB.
Google Chrome: Malicious URL checker.Instead of storing every bad URL in the browser (too big), they store a Bloom Filter. If you visit a site, Chrome checks the filter. If it says "Maybe dangerous," it calls Google servers to confirm.
Cassandra/Postgres (Databases):Before searching a hard disk for a row, they check a Bloom Filter. If the row isn't there, they save a slow disk operation.

How to scale?

So, sure we can decide the size of the array and of the items. But it is just an assumption, it will change in the future. So how to scale it. The scaling technology is implemented using stacking bloom fiters with greater size aslo coined as Scalable Bloom Filters (SBF). This technique is used by the bloom filter provided by the Redis Stack.

So, how does the stacking algo works?

A. The "Write" Rule (Adding Data)
We only write to the Newest (Active) filter.

If you want to add "Ayush", you don't touch Filter 1 or Filter 2. You add him to Filter 3.
Why? Because Filters 1 and 2 are frozen.

B. The "Read" Rule (Checking Data)
We must check ** ALL** filters, usually starting from the newest to the oldest.

Question: "Does Ayush exist?"
Check Filter 3: "No." (He isn't in the new hotel).
Check Filter 2: "No."
Check Filter 1: "Yes." (He checked in 2 months ago).
Result: True.

The Logical OR: Result = F3.Exists() || F2.Exists() || F1.Exists()
Problems with the scaling approach....

Every time we add a new filter layer, we introduce a New Source of Lies (False Positives).

If Filter 1 lies 1% of the time, and Filter 2 lies 1% of the time...
Your total chance of being lied to is now roughly 2%.
If you have 10 filters, your error rate becomes 10%. That is garbage.

How Redis fixed it?

They fixed it by making every new filter stricter than the last one.

Filter 1: We allow 0.1% error.
Filter 2: We allow 0.01% error. (10x stricter).
Filter 3: We allow 0.001% error. (100x stricter).

Hence even if our stack uses 20 filters, the total error rate stays tiny (under 1%) because the later filters are so precise.

Breaking the "Pattern": How We Built (and tried to scale) Strategic AI for Sweep

Ayush Kumar Anand — Sat, 20 Dec 2025 16:56:00 +0000

In the world of online card games, players aren't just looking for a challenge; they’re looking for a soul.
At my company, our game Sweep had a long-standing "bot problem." Our computer opponents were built on a classic rule-based system. While they followed the mechanics perfectly, they were fundamentally predictable. Experienced players quickly identified their repetitive move patterns, effectively turning a game of high-stakes strategy into a solved puzzle.
We knew we had to evolve. We needed bots that could reason, bluff, and adapt like humans. This is the story of how we integrated locally hosted LLMs to build a "human-level" experience, and the hard infrastructure lessons we learned along the way.

The Proof of Concept: Beyond "If/Then" Logic

Our goal was to move from static rules to a Deterministic AI system that leveraged the pattern-matching power of Large Language Models.

The Hybrid "Move-Index" Approach

Instead of letting the LLM generate raw text (which is slow and hard to parse), we used it as a Pattern-Matching Engine. Here was our workflow:

Generation: Our existing rule-based system would generate every possible valid move for the current game state.
Rating: These moves were rated based on basic metrics (points, defense, etc.).
Selection: We fed the board state and the list of rated moves into the LLM.
Indexing: We asked the LLM to return only the index number of its chosen move.

This "index-only" response drastically reduced processing complexity and token overhead, allowing us to focus the AI's power solely on strategic choice rather than language generation.

Infrastructure Realities: Paramaters and Performance

When we moved from theory to locally hosted models (using Ollama), we hit our first major wall: the "Parameters vs. Latency" trade-off.

At one point, we naively tried to route every bot move through the LLM. The GPU queue exploded, latency spiked, and we had to roll back the feature within hours.

Small Models (Under 3B parameters): These models were blazing fast, but they were "playing dumb." They missed obvious partner synergies and failed to recognize long-term threats.
Large Models (Over 13B parameters): These were strategic masters, but they were too slow. By the time they finished "reading" the prompt and processing the board state, the player experience had already stalled.

To standardize our AI's "personality," we used Ollama Modelfiles to save specific system prompts. This ensured every bot instance had the same strategic baseline without us having to re-send huge prompt blocks for every turn.

Scaling with "Strategic Triage"

We technically succeeded in creating human-level bots, but we couldn't scale. Our infrastructure simply couldn't handle thousands of bots trying to hit a local GPU at the same time.

The solution was Strategic Triage.

Instead of giving every bot an LLM "brain" for every turn, we categorized moves:

The "Obvious" Tier: Simple captures were still handled by the fast rule-based system.
The "Critical" Tier: In 4-player games where Partner Synergy was vital, we activated the LLM.
Probabilistic Shedding: We only used the LLM brain for a percentage of the total bot pool. This "triage" helped us scale while still making the overall bot population feel more intelligent and unpredictable.

Humanizing the Bot: The Temperature Lever

To prevent the bots from becoming too perfect (and thus, boring), we used the Temperature parameter to simulate human mindsets.

Low Temperature (0.1–0.3): Created "Conservative" bots that played strictly by the book—perfect for professional-level rooms.
High Temperature (0.7–0.9): Created "Risk-Takers." These bots made aggressive, sometimes "erroneous" moves that felt exactly like a human player trying a bold bluff.

Conclusion: The Path Forward

The PoC proved that LLMs can indeed break the repetitive patterns of rule-based bots. However, the infrastructure cost of locally hosting high-parameter models is the current frontier. By using Deterministic Triage and Index-based responses, we've found a middle ground: bots that play like people, without the server-melting overhead.

I Wrote a WebSocket Client From Scratch and It Ate My RAM

Ayush Kumar Anand — Sun, 14 Dec 2025 14:45:57 +0000

My company's production code runs on an old version of ejabberd 18, I needed to write a websocket client from scratch.
I wrote the code, tested it and then it was deployed to production.
But randomly, my whole ejabberd vm started crashing with heap allocation error.
This is the story of how I found a silent memory leak, how etop became my best friend, and why unbounded buffers are the real villains in network programming.

🧩 The Architecture (Why It Wasn’t Just a Loop)

Our gaming backend needs hundreds–thousands of persistent WebSocket connections. So the structure was a classic Erlang supervision tree:

owebsocket_sup → root supervisor
connection_manager → maintains the required number of connections and handles periodic reconnects, because infra layers (e.g., AWS load balancers) may terminate WebSocket sessions after a fixed lifetime.
owebsocket_client_sup → dynamic supervisor
owebsocket_client → one process per connection

Each worker is isolated. One bad connection shouldn't take down the system.

In theory.

🕳️ The Bug That Hid in Plain Sight

Here’s the heart of the problem — my frame parsing logic:

retrieve_frame(State, Data) ->
      Buffer = State#owebsocket_state.buffer,
      UpdatedBuffer = <<Buffer/bits, Data/bits>>,
      State#owebsocket_state{buffer = UpdatedBuffer}.

Looks harmless, right?
Append new data → check for complete frame → wait for more.
But here’s the catch: TCP is a stream, not a message boundary.
If the server: sent partial frames, or sent a huge frame, or sent junk my parser didn’t recognize …my code never hit the “complete frame” condition.
It just kept appending.

And appending.
And appending.

Result:

1 KB → fine
50 KB → fine
10 MB → suspicious
300 MB → oh no
16 GB → "OOM-killer enters the chat

(Yes, it really hit ~16 GB. I checked etop twice because I thought the numbers were lying.)

Large binaries in Erlang live off-heap.
Once a process holds a reference to a giant binary, the VM cannot free it until that process dies.
So one unlucky WebSocket client was quietly hoarding RAM like a dragon.

🔍 The Worst Part: It Was Random

When debugging memory issues, consistency is your best friend.

This bug had none.

Some days it ran perfectly for 6 hours.
Some days it crashed in 20 minutes.
Some days it didn’t crash at all.

I started suspecting TCP fragmentation patterns, upstream throttling, maybe even ghost data. It was the kind of randomness that makes you question your life choices.

So I opened etop and watched memory usage per process and added logs to see the size of updated buffer before it carshed.

At first: stable.
Later: one client process growing linearly.
Eventually: one worker at 700 MB alone.
And in the log file the size was ~10GB.
That’s when the bulb went off.

The buffer never reset.

🛠️ The Fix: Never Trust the Network

I introduced hard limits:

-define(MAX_BUFFER_SIZE, 5 * 1024 * 1024). %% 5 MB
-define(ERR_BUFFER_SIZE_LIMIT_EXCEEDED, {error, buffer_limit_exceeded}).

And updated the logic:

retrieve_frame(State, Data) ->
    Buffer = State#owebsocket_state.buffer,
    UpdatedBuffer = <<Buffer/bits, Data/bits>>,

    if size(UpdatedBuffer) > ?MAX_BUFFER_SIZE ->
        ?ERR_BUFFER_SIZE_LIMIT_EXCEEDED;
    true ->
        State#owebsocket_state{buffer = UpdatedBuffer}
    end.

Then handled it safely:

handle_info({Transport, Socket, Bs}, State) ->
    case handle_response(Bs, State) of
        ?ERR_BUFFER_SIZE_LIMIT_EXCEEDED ->
            error_logger:error_msg("Buffer limit exceeded. Dropping connection."),

            %% Stop receiving data
            Transport:close(Socket),

            %% Reconnect with fresh state
            NewState =
                try_reconnect(buffer_limit,
                              State#owebsocket_state{
                                  socket = undefined,
                                  buffer = <<>> }),
            {noreply, NewState};

        NewState ->
            {noreply, NewState}
    end.

If a client misbehaves → drop it.
If a frame is too large → drop it.
If random partial data confuses the parser → drop it.

Better one connection dies than the whole VM.

📘 What I Learned

Building protocols from scratch teaches you things libraries hide from you:

✔ Always enforce buffer size limits

If you don’t, RAM will do it for you.

✔ Never assume input is reasonable

Even if the spec says so.

✔ etop and logging are your friends

It will show you exactly which process is misbehaving and error logs.

✔ Restart one process, not the VM

That’s the whole point of Erlang.

✔ The best code is sometimes the one that says “nope.”

Dropping a bad connection saved the entire system.

After adding a 5 MB limit and a reconnection strategy, the system has been stable — no more OOM kills, no more ghost crashes, and no more 2 AM staring contests with erl_crash.dump.

Sometimes reliability is not about writing more code.

It’s about knowing when to stop accepting input.

Illusion of isolation in Docker

Ayush Kumar Anand — Sun, 07 Dec 2025 14:26:59 +0000

In my company, I was recently given access to the docker group to run containers. This sounded standard, but it led me to discover how I could gain functional root privileges on the host machine just by being a member of this group.

To understand how, let’s agree on a common Docker principle: The Docker Daemon runs as a root process.

The Docker CLI interacts with the daemon through a Unix socket (/var/run/docker.sock). If you have access to the CLI (which the docker group gives you), you have read and write access to this socket.

The Exploit: Breaking the Isolation

In general, Docker images run as isolated processes. But something "magical" happens when we run a container by mounting the host’s root file system.

Consider this command:

docker run -v /:/host_root -it centos bash

Here, the container now has read/write access to the host system's root directory (/). By mounting it as a volume, we bypass the container's Union Filesystem (and lose the Copy-On-Write isolation) for that specific path.

The chroot trick

At this stage, we have just mounted the file system. If I try to install a service (e.g., yum install tree) inside the container normally, it installs in the container's /usr/bin, not the host's.

But, suppose I run this command inside the container:

chroot /host_root

The chroot (Change Root) command changes the apparent root directory for the current running process.

I am telling my process: "Treat the mounted /host_root as your actual / directory."
The root user in a container usually has a UID of 0.
The Linux Kernel allows UID 0 to do whatever it wants.

At this point, even though I’m still inside a container, I’m operating directly on the host’s filesystem with root-level permissions — which is just as dangerous in practice.

If I install a package now, it installs on the host. If I delete a file, it is deleted from the host.

How to manage this?

Since granting docker group access is essentially granting root access, here is how we can secure it:

Rootless Docker: Run the Docker daemon as a non-root service. However, rootless Docker comes with trade-offs: reduced performance and limitations around direct access to storage and the networking stack. So it has to add some wayaround, that incurs additional overhead.
Read-Only Mounts: If you must mount the host filesystem, enforce read-only access: docker run -v /:/host_root:ro ...
Principle of Least Privilege: Try adding minimal users to the docker user.
Secure Dockerfiles: When creating images, explicitly create a non-root user. Example of a secure user setup:

RUN groupadd -r container_user && useradd -r -g container_user container_user RUN chown -R container_user:container_user /host_root USER container_user