<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Aaron Yong</title>
    <description>The latest articles on Forem by Aaron Yong (@aaronyong_14b256a).</description>
    <link>https://forem.com/aaronyong_14b256a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1621209%2Fadd0905f-c144-46f9-9731-422455dd292a.png</url>
      <title>Forem: Aaron Yong</title>
      <link>https://forem.com/aaronyong_14b256a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/aaronyong_14b256a"/>
    <language>en</language>
    <item>
      <title>Why Postman Works But Your Browser Doesn't</title>
      <dc:creator>Aaron Yong</dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:43:22 +0000</pubDate>
      <link>https://forem.com/aaronyong_14b256a/why-postman-works-but-your-browser-doesnt-49ne</link>
      <guid>https://forem.com/aaronyong_14b256a/why-postman-works-but-your-browser-doesnt-49ne</guid>
      <description>&lt;p&gt;It's 11pm. Your React frontend is running on &lt;code&gt;localhost:5173&lt;/code&gt;. Your API is running on &lt;code&gt;localhost:8080&lt;/code&gt;. You hit the endpoint from Postman — &lt;code&gt;200 OK&lt;/code&gt;, beautiful JSON, everything works. You hit the same endpoint from your frontend — red text in the console, something about "Access-Control-Allow-Origin," and an undefined response body. You paste the error into Google, add &lt;code&gt;Access-Control-Allow-Origin: *&lt;/code&gt; to your server, and move on.&lt;/p&gt;

&lt;p&gt;I've done this. You've done this. We've all done this. But that quick fix papers over a mechanism that's genuinely worth understanding — because when CORS breaks in production (and it will), the fix isn't always a wildcard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Netscape Started This (In 1995)
&lt;/h2&gt;

&lt;p&gt;To understand CORS, you have to understand the thing it relaxes: the &lt;strong&gt;Same-Origin Policy&lt;/strong&gt;. SOP shipped in Netscape Navigator 2.0 in 1995, at the exact same time as JavaScript. The timing isn't a coincidence — the moment browsers could run scripts that make network requests and read the DOM, they needed a boundary to prevent one site from reading another site's data.&lt;/p&gt;

&lt;p&gt;An "origin" is three things: &lt;strong&gt;scheme + host + port&lt;/strong&gt;. &lt;code&gt;https://example.com:443&lt;/code&gt; is one origin. Change any of the three — swap &lt;code&gt;https&lt;/code&gt; for &lt;code&gt;http&lt;/code&gt;, change &lt;code&gt;example.com&lt;/code&gt; to &lt;code&gt;api.example.com&lt;/code&gt;, use port &lt;code&gt;8080&lt;/code&gt; instead of &lt;code&gt;443&lt;/code&gt; — and you've got a different origin.&lt;/p&gt;

&lt;p&gt;That subdomain thing catches people: &lt;code&gt;app.example.com&lt;/code&gt; and &lt;code&gt;api.example.com&lt;/code&gt; are &lt;strong&gt;cross-origin&lt;/strong&gt;, even though they share a root domain. This is the exact scenario that produces most CORS errors in the wild.&lt;/p&gt;

&lt;p&gt;Without SOP, a malicious ad on a news site could silently fetch your Gmail inbox using your authenticated cookies. A phishing page could read your bank balance from another tab. SOP prevents that by blocking JavaScript from reading cross-origin responses. The web as a platform for banking, email, and private data depends on this one rule.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Most Misunderstood Error on the Internet
&lt;/h2&gt;

&lt;p&gt;Here's the thing that confuses everyone: &lt;strong&gt;CORS errors don't mean the server blocked your request&lt;/strong&gt;. The request was sent. The server received it, processed it, and returned a response. The &lt;em&gt;browser&lt;/em&gt; just won't let your JavaScript read that response because the server didn't include the right headers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What actually happens:

1. Your JS calls fetch('https://api.example.com/data')
2. Browser sends the request (yes, it leaves your machine)
3. Server processes it, returns 200 OK with data
4. Browser checks: does the response have Access-Control-Allow-Origin?
5. Nope → browser hides the response from your JavaScript
6. Console: "has been blocked by CORS policy"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why Postman works. Postman isn't a browser — it doesn't enforce SOP. Neither does curl, nor server-to-server HTTP calls, nor mobile apps making direct API requests. CORS is exclusively a browser-enforced protocol, because browsers are the only environment that runs untrusted JavaScript from arbitrary origins.&lt;/p&gt;

&lt;p&gt;The server's CORS headers are essentially a permission slip: "yes, I expect requests from &lt;code&gt;https://app.example.com&lt;/code&gt;, and it's OK for their JavaScript to read my responses."&lt;/p&gt;

&lt;h2&gt;
  
  
  The OPTIONS Interrogation
&lt;/h2&gt;

&lt;p&gt;Now for the part that really baffles people: sometimes the browser sends &lt;em&gt;two&lt;/em&gt; requests instead of one. Before your actual &lt;code&gt;PUT /api/users&lt;/code&gt; with &lt;code&gt;Content-Type: application/json&lt;/code&gt;, the browser sends an &lt;code&gt;OPTIONS /api/users&lt;/code&gt; — a "preflight" request asking the server for permission.&lt;/p&gt;

&lt;p&gt;Not every request gets a preflight. The browser divides cross-origin requests into "simple" and "not simple":&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple&lt;/strong&gt; (no preflight): GET, HEAD, or POST — but only if the headers are basic (&lt;code&gt;Accept&lt;/code&gt;, &lt;code&gt;Content-Language&lt;/code&gt;, &lt;code&gt;Content-Type&lt;/code&gt;) and the content type is one of &lt;code&gt;application/x-www-form-urlencoded&lt;/code&gt;, &lt;code&gt;multipart/form-data&lt;/code&gt;, or &lt;code&gt;text/plain&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not simple&lt;/strong&gt; (preflight required): anything else. &lt;code&gt;PUT&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;, &lt;code&gt;PATCH&lt;/code&gt;, any custom header like &lt;code&gt;Authorization&lt;/code&gt;, or &lt;code&gt;Content-Type: application/json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Why this specific split? It's actually brilliant: the "simple" criteria match exactly what an HTML &lt;code&gt;&amp;lt;form&amp;gt;&lt;/code&gt; could already do before JavaScript existed. A form could always POST &lt;code&gt;application/x-www-form-urlencoded&lt;/code&gt; data to any URL. SOP never blocked that, so CORS doesn't add a preflight for it either. The preflight only kicks in for requests that JavaScript made &lt;em&gt;newly possible&lt;/em&gt; — the ones that couldn't happen before the &lt;code&gt;fetch&lt;/code&gt; API.&lt;/p&gt;

&lt;p&gt;Here's the preflight dance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser                              Server
  │                                    │
  │── OPTIONS /api/users ─────────────&amp;gt;│  "Can I PUT with JSON + Auth?"
  │   Origin: https://app.com          │
  │   Access-Control-Request-Method:   │
  │     PUT                            │
  │   Access-Control-Request-Headers:  │
  │     Content-Type, Authorization    │
  │                                    │
  │&amp;lt;── 204 ────────────────────────────│  "Yes, here's what I allow"
  │   Access-Control-Allow-Origin:     │
  │     https://app.com               │
  │   Access-Control-Allow-Methods:    │
  │     GET, POST, PUT, DELETE        │
  │   Access-Control-Max-Age: 86400   │
  │                                    │
  │── PUT /api/users ─────────────────&amp;gt;│  (actual request, now permitted)
  │   { "name": "Aaron" }             │
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Access-Control-Max-Age: 86400&lt;/code&gt; header tells the browser to cache the preflight result for 24 hours. Without it, every single non-simple request fires two HTTP calls — your API traffic doubles for no reason. (Chrome caps the cache at 2 hours regardless of what you set, because of course it does.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Credentials Trap
&lt;/h2&gt;

&lt;p&gt;By default, cross-origin requests don't send cookies or auth headers. If you want them included:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Client: explicitly opt in&lt;/span&gt;
&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.example.com/data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;include&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// now cookies are sent cross-origin&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But this comes with a hard rule: &lt;strong&gt;the server cannot respond with &lt;code&gt;Access-Control-Allow-Origin: *&lt;/code&gt; when credentials are involved&lt;/strong&gt;. It must be the specific origin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Access-Control-Allow-Origin: https://app.example.com  # specific, not *
Access-Control-Allow-Credentials: true
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deliberate. If &lt;code&gt;*&lt;/code&gt; worked with credentials, any website on the internet could make authenticated requests to your API and read the responses. That's not a CORS misconfiguration — that's a data breach waiting to happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Config Across Stacks
&lt;/h2&gt;

&lt;p&gt;I've set up CORS in Spring Boot, Express, and nginx at this point, and the patterns are similar but the gotchas are stack-specific.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Express&lt;/strong&gt;, the &lt;code&gt;cors&lt;/code&gt; package handles OPTIONS automatically — you barely think about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cors&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cors&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nf"&gt;cors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:5173&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Vite dev server&lt;/span&gt;
    &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;strong&gt;Spring Boot&lt;/strong&gt;, forgetting to include &lt;code&gt;OPTIONS&lt;/code&gt; in &lt;code&gt;allowedMethods&lt;/code&gt; is a classic silent failure — preflights just return 403 and the console error gives you nothing useful:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/**"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allowedOrigins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:5173"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allowedMethods&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GET"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"POST"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"PUT"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"DELETE"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"OPTIONS"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// don't forget OPTIONS&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;allowCredentials&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in &lt;strong&gt;nginx&lt;/strong&gt;, the sneakiest bug is configuring CORS in &lt;em&gt;both&lt;/em&gt; nginx and the app server, which produces duplicate &lt;code&gt;Access-Control-Allow-Origin&lt;/code&gt; headers. The browser rejects that too. Pick one place to handle CORS — never both.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Wait, We Don't Even Need CORS?"
&lt;/h2&gt;

&lt;p&gt;Here's the plot twist most tutorials skip: in production, you often don't need CORS at all. If your frontend and API are served from the same origin via a reverse proxy, there's no cross-origin request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Development (CORS needed):
  localhost:5173  ──fetch──&amp;gt;  localhost:8080  ← different ports = different origin

Production (no CORS):
  example.com  ──fetch──&amp;gt;  example.com/api/
       │                        │
       └── nginx serves         └── nginx proxies to backend:8080
           static files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same scheme, same host, same port. Same origin. No CORS headers needed. The browser never even checks. This is the pattern most production deployments end up using — nginx (or Cloudflare, or your platform's edge proxy) sits in front of everything, making the frontend and API appear as one origin.&lt;/p&gt;

&lt;p&gt;CORS configuration then becomes a development-only concern: your Vite dev server on port &lt;code&gt;5173&lt;/code&gt; and your API on port &lt;code&gt;8080&lt;/code&gt; are different origins. Once you deploy behind a proxy, the problem dissolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Do This (Please)
&lt;/h2&gt;

&lt;p&gt;A few CORS configurations that will make your security team cry:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reflecting any origin&lt;/strong&gt;: Server blindly echoes back whatever &lt;code&gt;Origin&lt;/code&gt; header the browser sends, with &lt;code&gt;credentials: true&lt;/code&gt;. This is equivalent to no security at all. Any malicious site can read your authenticated API responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trusting &lt;code&gt;null&lt;/code&gt; origin&lt;/strong&gt;: The &lt;code&gt;null&lt;/code&gt; origin sounds harmless, but attackers can trigger it from sandboxed iframes. If your server trusts &lt;code&gt;Origin: null&lt;/code&gt; with credentials, you have a vulnerability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suffix matching&lt;/strong&gt;: &lt;code&gt;if (origin.endsWith('.example.com'))&lt;/code&gt; also matches &lt;code&gt;evil-example.com&lt;/code&gt;. Always validate the full origin string, not just a suffix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Origin Story (Pun Intended)
&lt;/h2&gt;

&lt;p&gt;CORS is one of those things that seems like a nuisance until you understand what it's protecting. The Same-Origin Policy has been the security backbone of the web since 1995 — before cookies had &lt;code&gt;SameSite&lt;/code&gt;, before CSP existed, before CSRF tokens were standard practice. CORS doesn't weaken that protection; it gives servers a structured way to poke specific, controlled holes in it.&lt;/p&gt;

&lt;p&gt;Next time you see that red console error, at least you'll know: your server got the request just fine. It's the browser that's looking out for you.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>The Data Structure That's Okay With Being Wrong</title>
      <dc:creator>Aaron Yong</dc:creator>
      <pubDate>Thu, 02 Apr 2026 03:04:08 +0000</pubDate>
      <link>https://forem.com/aaronyong_14b256a/the-data-structure-thats-okay-with-being-wrong-15mj</link>
      <guid>https://forem.com/aaronyong_14b256a/the-data-structure-thats-okay-with-being-wrong-15mj</guid>
      <description>&lt;h2&gt;
  
  
  The Million-Row Problem
&lt;/h2&gt;

&lt;p&gt;You're building a URL shortener. Every time someone creates a short link, you generate a random code and check if it already exists in the database. One database query per attempt. At 1,000 URLs, this is fine — the query takes a millisecond, the index is tiny, nobody notices.&lt;/p&gt;

&lt;p&gt;At 100 million URLs, you're generating codes that collide more often (birthday paradox), each collision triggers another database round trip, and those round trips add up under high throughput. You're not slow because your code is bad — you're slow because you're asking the database a question it doesn't need to answer.&lt;/p&gt;

&lt;p&gt;What if you could check "does this code already exist?" without touching the database at all?&lt;/p&gt;

&lt;h2&gt;
  
  
  A Bit Array With an Attitude
&lt;/h2&gt;

&lt;p&gt;A Bloom filter is a bit array (say, 20 bits, all starting at 0) combined with a handful of hash functions. Let me walk through the full lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Starting state — empty bit array:

  [0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0]
   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19


Check "abc123" — is it in the set?
  hash1 → position 3  → 0 ❌
  → "Definitely not" (one zero = guaranteed absent, no need to check other hashes)


Insert "abc123":
  hash1 → position 3
  hash2 → position 8
  hash3 → position 14

  [0][0][0][1][0][0][0][0][1][0][0][0][0][0][1][0][0][0][0][0]
            ^               ^                 ^
            3               8                14


Check "abc123" — is it in the set now?
  hash1 → position 3  → 1 ✅
  hash2 → position 8  → 1 ✅
  hash3 → position 14 → 1 ✅
  → "Probably exists" ✓ (correct — we just inserted it)


Check "xyz789" — is it in the set?
  hash1 → position 3  → 1 ✅ (set by abc123!)
  hash2 → position 6  → 0 ❌
  → "Definitely not" (one zero = guaranteed absent)


Insert "xyz789":
  hash1 → position 3  (already 1 — shared with abc123)
  hash2 → position 6
  hash3 → position 17

  [0][0][0][1][0][0][1][0][1][0][0][0][0][0][1][0][0][1][0][0]
            ^        ^     ^                 ^        ^
            3(shared) 6    8                14       17


Check "def456" — is it in the set? (we NEVER inserted it)
  hash1 → position 3  → 1 ✅ (set by abc123)
  hash2 → position 6  → 1 ✅ (set by xyz789)
  hash3 → position 14 → 1 ✅ (set by abc123)
  → "Probably exists" ✗ FALSE POSITIVE!
    (all bits happen to be set by OTHER elements)


Check "nope00" — is it in the set?
  hash1 → position 2  → 0 ❌
  → "Definitely not" ✓ (correct — one zero bit = guaranteed absent)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "def456" check is the key insight. It was never inserted, but all three of its hash positions were coincidentally set by other elements. That's a false positive — the Bloom filter is wrong, but it's wrong in a predictable, controllable way.&lt;/p&gt;

&lt;p&gt;Two possible answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Definitely not in the set"&lt;/strong&gt; — 100% correct, every time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Probably in the set"&lt;/strong&gt; — correct most of the time, occasionally wrong (false positive)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It never gives false negatives. If it says no, you can trust it completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Would You Want an Inaccurate Data Structure?
&lt;/h2&gt;

&lt;p&gt;Because it's absurdly small and fast.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Hash Set&lt;/th&gt;
&lt;th&gt;Database Query&lt;/th&gt;
&lt;th&gt;Bloom Filter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;O(n) — stores all values&lt;/td&gt;
&lt;td&gt;0 (on disk)&lt;/td&gt;
&lt;td&gt;Fixed size bit array&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lookup speed&lt;/td&gt;
&lt;td&gt;O(1)&lt;/td&gt;
&lt;td&gt;~1ms (network + index)&lt;/td&gt;
&lt;td&gt;O(1) — a few hash computations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;~99%+ (configurable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can delete?&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network call?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A Bloom filter storing 1 million items with a 0.1% false positive rate uses about 1.8 MB of memory. A HashSet storing the same 1 million strings uses 50-100 MB. The database stores them on disk but needs a network round trip for every check.&lt;/p&gt;

&lt;p&gt;The Bloom filter trades a tiny amount of accuracy for a massive reduction in memory and the elimination of network calls. For many use cases, that trade-off is more than worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The URL Shortener, Optimized
&lt;/h2&gt;

&lt;p&gt;Here's how this applies to the problem from the intro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BloomFilter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bloom-filters&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bloom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 1M capacity, 0.1% false positive&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateUniqueCode&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;randomBase62&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bloom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// in-memory check (~1μs)&lt;/span&gt;

    &lt;span class="c1"&gt;// Bloom says "definitely not exists" — verify with DB as safety net&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;exists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findUnique&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;shortCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;bloom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Failed to generate unique code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Bloom filter handles the common case: the code doesn't exist (which is true for the vast majority of attempts). Only when the Bloom filter says "probably exists" — either correctly or as a false positive — do we fall back to the database. At 100 million URLs, this eliminates 99%+ of database queries for code generation.&lt;/p&gt;

&lt;p&gt;The database is still the source of truth. The Bloom filter is a fast, cheap pre-filter that prevents unnecessary round trips. It's the same principle as &lt;a href="https://site.aaronhsyong.com/posts/the-fastest-code-never-runs/" rel="noopener noreferrer"&gt;caching&lt;/a&gt; — avoid the expensive operation when you already know the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tuning Knob
&lt;/h2&gt;

&lt;p&gt;You control the false positive rate. Lower rate = more memory:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Items&lt;/th&gt;
&lt;th&gt;False Positive Rate&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;~1.2 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;td&gt;~1.8 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;0.01%&lt;/td&gt;
&lt;td&gt;~2.4 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;td&gt;~18 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100M&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;td&gt;~180 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even at 100 million items with 0.1% false positives, you're using 180 MB — less than a single Chrome tab. A HashSet storing 100 million strings would need several gigabytes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Bloom Filters Show Up in Production
&lt;/h2&gt;

&lt;p&gt;This isn't a theoretical data structure. It's used at massive scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Chrome&lt;/strong&gt; checks every URL you visit against a local Bloom filter of known malicious sites. If the Bloom filter says "not malicious" (which is almost always), Chrome skips the network call to the Safe Browsing API entirely. Only on a "probably malicious" hit does it make the API call to confirm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cassandra and HBase&lt;/strong&gt; use Bloom filters to check if a row exists in an SSTable before reading from disk. Disk reads are expensive. A Bloom filter that says "definitely not in this file" saves a disk seek.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium&lt;/strong&gt; uses Bloom filters to track which articles a user has already seen, so recommendations don't show duplicates. Storing "user X has seen articles [1, 2, 3, ..., 10000]" for millions of users as full sets would be massive. Bloom filters compress this to a few KB per user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bitcoin&lt;/strong&gt; SPV (Simplified Payment Verification) nodes use Bloom filters to request only relevant transactions from full nodes without downloading the entire blockchain.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use It
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Bloom Filter?&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pre-filter before expensive lookup (DB, disk, API)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Eliminates most unnecessary lookups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web crawler ("have I visited this URL?")&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Millions of URLs, memory-efficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spell checker ("is this a real word?")&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Dictionary is static, fast lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://site.aaronhsyong.com/posts/the-fastest-code-never-runs/" rel="noopener noreferrer"&gt;Cache&lt;/a&gt; miss prevention&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Avoid DB query when key isn't cached&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate detection in streams&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Can't store all seen items in memory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When NOT to Use It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need to delete elements&lt;/strong&gt; — standard Bloom filters don't support deletion (flipping a bit to 0 might affect other elements). Counting Bloom filters exist but add complexity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False positives are unacceptable&lt;/strong&gt; — financial transactions, access control, anything where "probably yes" isn't good enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The dataset fits in a regular Set&lt;/strong&gt; — if you have 10,000 items, just use a HashSet. Bloom filters shine at millions+.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need to retrieve the actual values&lt;/strong&gt; — Bloom filters only answer "exists?" They don't store or return the elements themselves.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Bloom filters fit into the same optimization philosophy as &lt;a href="https://site.aaronhsyong.com/posts/the-fastest-code-never-runs/" rel="noopener noreferrer"&gt;caching&lt;/a&gt; and &lt;a href="https://site.aaronhsyong.com/posts/the-one-line-fix/" rel="noopener noreferrer"&gt;indexing&lt;/a&gt;: avoid the expensive operation when a cheaper check can give you the answer first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request comes in
  → Check Bloom filter (nanoseconds, in-memory)
    → "Definitely not" → skip the expensive lookup
    → "Probably yes" → do the actual lookup to confirm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's not about replacing your database or your cache. It's about putting a cheap, fast guard in front of them so they only do work when it matters. Another tool in the &lt;a href="https://site.aaronhsyong.com/posts/scaling-the-right-thing/" rel="noopener noreferrer"&gt;optimize before you scale&lt;/a&gt; toolkit.&lt;/p&gt;

</description>
      <category>datastructures</category>
      <category>bloomfilter</category>
      <category>probabilistic</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
