<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Akarshan Gandotra</title>
    <description>The latest articles on Forem by Akarshan Gandotra (@akarshan).</description>
    <link>https://forem.com/akarshan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F69510%2F5056764e-527e-442d-860b-cab2dbbf57cd.jpg</url>
      <title>Forem: Akarshan Gandotra</title>
      <link>https://forem.com/akarshan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/akarshan"/>
    <language>en</language>
    <item>
      <title>Part 10 — Lessons learned building a Kubernetes Auth Gateway</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:42:44 +0000</pubDate>
      <link>https://forem.com/akarshan/lessons-learned-building-a-kubernetes-auth-gateway-100m</link>
      <guid>https://forem.com/akarshan/lessons-learned-building-a-kubernetes-auth-gateway-100m</guid>
      <description>&lt;p&gt;We're at the end of the series. Nine chapters of mechanism. One chapter of opinion.&lt;/p&gt;

&lt;p&gt;Building the Auth Gateway took roughly two years from "what if NGINX did the auth?" to "this thing handles every authenticated request in production." A lot of what's in the previous chapters wasn't obvious to us at the start. This is the post-mortem on our own architecture: what worked, what hurt, what we'd build earlier, and what we'd warn the next team about.&lt;/p&gt;

&lt;h2&gt;
  
  
  What worked
&lt;/h2&gt;

&lt;p&gt;A few decisions held up cleanly. We'd make all of them again.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;auth_request&lt;/code&gt; as the primitive
&lt;/h3&gt;

&lt;p&gt;NGINX's &lt;code&gt;auth_request&lt;/code&gt; directive is, with no exaggeration, the single most leveraged design choice in the platform. One directive, well-understood, supported across NGINX versions. We don't need a service mesh. We don't need a custom Envoy filter. We don't need a Lua module compiled into NGINX.&lt;/p&gt;

&lt;p&gt;If you can do your auth in HTTP-status terms (200/401/403), &lt;code&gt;auth_request&lt;/code&gt; is the right tool. If you can't, you probably want a sidecar or mesh-level enforcement and this whole architecture doesn't apply.&lt;/p&gt;

&lt;h3&gt;
  
  
  Endpoint metadata as data
&lt;/h3&gt;

&lt;p&gt;Storing endpoint type and required permissions in Postgres, refreshed via Pub/Sub, was the right call. It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We can change auth without redeploying the gateway.&lt;/li&gt;
&lt;li&gt;We can audit auth ("what protects this URL?") with a SQL query.&lt;/li&gt;
&lt;li&gt;Admin tooling and the gateway share a single contract.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cost — a small DB lookup at boot, an in-memory trie, a refresh mechanism — was tiny compared to the operational flexibility we got back.&lt;/p&gt;

&lt;h3&gt;
  
  
  One structured log line per decision
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AUTH_DECISION&lt;/code&gt; is the contract between the Auth Service and oncall. Every field, every time, every request. A year of operations later, this is the artifact we reference most often. Every alert we've built points at it. Every incident postmortem references it.&lt;/p&gt;

&lt;p&gt;Resist the temptation to add &lt;code&gt;INFO&lt;/code&gt;/&lt;code&gt;DEBUG&lt;/code&gt; lines around it. Resist the temptation to omit fields when they're "not relevant." One line. Same shape. Forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fail-closed by default at the edge
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;error_page 502 503 504 = @auth_unavailable;&lt;/code&gt; was a one-line change that defines our security posture. When the Auth Service is unhealthy, NGINX returns 503 to the client &lt;em&gt;instead of&lt;/em&gt; letting the request through. We've had a few incidents where this caused brief platform-wide outages. We have never regretted the choice.&lt;/p&gt;

&lt;p&gt;The principle: the cost of a 5-minute outage on rare occasions is much, much less than the cost of one cross-tenant data leak ever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caches in the auth process, not in NGINX
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;auth_request&lt;/code&gt; is intentionally not cacheable, and we leaned into that. Every cache lives inside the Auth Service: JWT verify, RSA keys, route lookup, policy bitmap, revocation map, SA versions. Each is invalidated through its own channel. The gateway's hot path makes zero Redis calls in steady state.&lt;/p&gt;

&lt;p&gt;This kept the architecture honest. The auth pod is the unit of correctness. Scale it, monitor it, debug it as one thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pub/Sub-driven trie reload
&lt;/h3&gt;

&lt;p&gt;Push-based invalidation for the endpoint trie was the right shape. Periodic-only would have given us a ~30 minute window where new admin routes were unprotected. Pub/Sub-only would have been brittle (events get lost). Both, with periodic as the safety net, gives us seconds of staleness in the common case and bounded staleness even when the message is lost.&lt;/p&gt;

&lt;p&gt;Most caches we'd default to TTL. The trie was worth the special case.&lt;/p&gt;

&lt;h3&gt;
  
  
  The bitmap fast path
&lt;/h3&gt;

&lt;p&gt;Encoding permissions as bit indexes paid off. Smaller tokens, faster checks, cleaner metrics. The legacy path we kept around for safety has earned its keep — version skew is real, and fall-through is graceful.&lt;/p&gt;

&lt;p&gt;Shadow mode for two months before flipping the switch was the right rollout pattern. Catching three real bugs in shadow with zero impact on production is the gold standard for a sensitive change like authorization logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What hurt
&lt;/h2&gt;

&lt;p&gt;Now the harder list. Things that cost us time, sleep, or trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tenant resolution living in two places
&lt;/h3&gt;

&lt;p&gt;NGINX resolves the tenant. The Auth Service &lt;em&gt;also&lt;/em&gt; checks tenant binding (token tenant matches request tenant). The two places do &lt;em&gt;different&lt;/em&gt; checks for a reason — but the reason isn't obvious, and we've watched several engineers add tenant logic to a third place because they didn't realize it was already covered.&lt;/p&gt;

&lt;p&gt;What we'd do differently: write a single tenant-resolution doc that explicitly enumerates &lt;em&gt;which layer owns what&lt;/em&gt; and &lt;em&gt;what each layer assumes about the others&lt;/em&gt;. A "tenancy contract" page. We have it now (chapter 5 is a recovered version of it); we should have had it on day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "first segment is the slug" rule
&lt;/h3&gt;

&lt;p&gt;For a long time, the Auth Service split the URI on &lt;code&gt;/&lt;/code&gt; and treated the first segment as the service slug. This worked until services started nesting each other or grouping under shared prefixes. We had to retrofit &lt;code&gt;X-Service-Slug&lt;/code&gt; and &lt;code&gt;X-Request-Path&lt;/code&gt; headers — backward-compatibly, with fallback to the old rule. The retrofit is fine; it took longer than it should have because the old rule was buried in three places.&lt;/p&gt;

&lt;p&gt;What we'd do differently: explicit slug headers from day one. Don't infer slugs from the URI structure. The path inside a service is the service's business; the slug is a separate concern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Migration to the bitmap took longer than expected
&lt;/h3&gt;

&lt;p&gt;The bitmap fast path was a six-week project that took five months. The math was straightforward. What ate the time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coordinating bit-index assignments with the token issuer team (different repo, different rollout cadence).&lt;/li&gt;
&lt;li&gt;Fixture data in our test suites was hardcoded with old permission strings; updating it for the bitmap registry was a long tail of small PRs.&lt;/li&gt;
&lt;li&gt;The shadow comparison logic exposed three subtle bugs (Chapter 6) that each required investigation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we'd do differently: assume cross-team auth changes are 3x what you estimate. Build the shadow harness &lt;em&gt;first&lt;/em&gt;, then the new path. The shadow harness paid for itself five times over.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache invalidation was an afterthought
&lt;/h3&gt;

&lt;p&gt;The first version of the JWT cache was a &lt;code&gt;map&lt;/code&gt; with &lt;code&gt;time.AfterFunc&lt;/code&gt; evictors. We covered it in Chapter 8. It seemed fine. It fell over in production within a week.&lt;/p&gt;

&lt;p&gt;The lesson generalizes: &lt;strong&gt;a cache without a written-down invalidation channel is a memory leak.&lt;/strong&gt; Every cache should have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A bounded size (entries or bytes).&lt;/li&gt;
&lt;li&gt;A clear invalidation event ("token expired", "trie reloaded", "revocation event consumed").&lt;/li&gt;
&lt;li&gt;A staleness window we can articulate ("up to 30 seconds late").&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can't write those three down, don't add the cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  The default tenant we shipped on day one
&lt;/h3&gt;

&lt;p&gt;For the first quarter we had a default tenant. "If no &lt;code&gt;X-Tenant-ID&lt;/code&gt; and no host match, fall through to &lt;code&gt;default-tenant&lt;/code&gt;." It was added because it made local dev easier.&lt;/p&gt;

&lt;p&gt;It cost us in two ways. First, removing it took longer than building it — every misconfigured client started 400'ing once we removed the fallback. Second, while it was live, it produced exactly one near-miss data leak (a service-account request without a tenant header writing into the wrong tenant). We caught it before it left staging.&lt;/p&gt;

&lt;p&gt;Shipping that default tenant was the worst single decision in the whole project. We'd remove it from every future system before it ever boots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-pod alert spam, twice
&lt;/h3&gt;

&lt;p&gt;Twice we shipped alert code that fired per-request rather than per-state. The first time was during a Redis outage in our second month (lit up Slack with ~10k messages in 90 seconds). The second was during an RSA misconfig rollout (a few hundred messages per minute per pod, fleet-wide).&lt;/p&gt;

&lt;p&gt;Both were the same bug: alerting from a request handler instead of from a state-transition observer. Both were "fixed" with &lt;code&gt;atomic.Bool&lt;/code&gt; swaps. Now we apply that pattern aggressively.&lt;/p&gt;

&lt;p&gt;What we'd do differently: write the alert dedup helper &lt;em&gt;first&lt;/em&gt;, before the first alert. Have it baked into the codebase before there's anything to alert on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd build earlier
&lt;/h2&gt;

&lt;p&gt;In the order we'd add them:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The structured &lt;code&gt;AUTH_DECISION&lt;/code&gt; log
&lt;/h3&gt;

&lt;p&gt;On day one. Even before fancy auth logic. The log structure outlives every other choice — every dashboard, every alert, every postmortem reads from it. Build the contract first.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Slack alert dedup helper
&lt;/h3&gt;

&lt;p&gt;Before the first alert. Five lines of code to wrap an &lt;code&gt;atomic.Bool&lt;/code&gt; around a Slack send. Ship it before you have anything to alert on.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Fail-closed posture
&lt;/h3&gt;

&lt;p&gt;Before the gateway sees a single request in production. Don't even try permissive defaults. The "we'll tighten it later" path becomes "we shipped a permissive-by-default thing for 18 months." Just ship it tight.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Endpoint metadata in DB
&lt;/h3&gt;

&lt;p&gt;Skip the YAML-of-routes phase. Skip the in-code decorator phase. Go straight to the database table with refresh mechanism. The transitional architectures cost more to migrate off than they cost to skip.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Gap probe on revocation streams
&lt;/h3&gt;

&lt;p&gt;The probe (Chapter 7) catches data loss between the stream and consumers. It costs almost nothing to run. Without it you don't &lt;em&gt;know&lt;/em&gt; if you're losing events; you just hope.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Shadow harness for sensitive changes
&lt;/h3&gt;

&lt;p&gt;Comparing old-vs-new in production with the new path muted is a powerful pattern. Build the harness as a reusable thing. We re-implemented variants of it for three different rollouts before realizing it should be a library.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Tenancy contract document
&lt;/h3&gt;

&lt;p&gt;One page. Owns: which layer resolves the tenant, which layer validates token-tenant binding, which layer scopes queries, what the failure modes are. Required reading before anyone touches request handling. Should have existed before the gateway shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The maturity progression
&lt;/h2&gt;

&lt;p&gt;Looking back, the gateway evolved through identifiable stages. They're worth naming because if you're starting fresh, knowing the destination shape lets you skip steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg5kduswix3k07ujlris.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg5kduswix3k07ujlris.png" alt=" " width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v1 — per-service auth libs.&lt;/strong&gt; Where most teams are. Each service has its own JWT decode, its own permission check. Inconsistent, drift-prone, slow to fix CVEs. Don't stay here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v2 — auth_request + minimal &lt;code&gt;/auth&lt;/code&gt;.&lt;/strong&gt; A simple gateway that decodes a token and returns 200/401. Static list of "open" routes. Enough to centralize the decision; not enough to scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3 — trie + classification + Pub/Sub.&lt;/strong&gt; Endpoint metadata in a DB. Trie in memory. Refresh kicks. Now adding a route doesn't require a redeploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v4 — revocation + caching.&lt;/strong&gt; Logout works. Admin disable works. Each cache layer in place. Hot path is sub-millisecond.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v5 — bitmap + structured logs + degraded mode.&lt;/strong&gt; The mature gateway. Fast, observable, alertable, recoverable.&lt;/p&gt;

&lt;p&gt;Most of the value lives between v2 and v3. If you're at v1, that's the migration to plan for. v3 to v5 is iteration; v1 to v3 is the &lt;em&gt;project&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five pieces of advice for teams building this
&lt;/h2&gt;

&lt;p&gt;If you're starting from scratch with a similar problem, here's what I'd hand off in five bullets:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Start with &lt;code&gt;auth_request&lt;/code&gt;. Don't shop architectures.
&lt;/h3&gt;

&lt;p&gt;Service mesh, custom Envoy filter, Lua plugin, sidecar — they all promise more flexibility. They all cost more in operations. &lt;code&gt;auth_request&lt;/code&gt; is enough for the 90% case, and the 10% is rarely worth the complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Make the gateway HA before anything else.
&lt;/h3&gt;

&lt;p&gt;Two replicas minimum, HPA, graceful shutdown, retries to upstream auth pod, circuit-breaker semantics in NGINX, fail-closed posture. If any one of these is missing the gateway &lt;em&gt;will&lt;/em&gt; take down your platform during a normal degraded event. This isn't optional.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The log is the API.
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;AUTH_DECISION&lt;/code&gt; log is a public contract with everyone who ever debugs your gateway. Treat it like a schema. Don't change field names without a migration. Don't add free-form strings to enum fields. Have one version-controlled doc that defines every field and every value of every enum.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cache invalidation has to be explicit.
&lt;/h3&gt;

&lt;p&gt;Every cache: bounded size, explicit invalidation channel, articulable staleness window. If a cache doesn't have all three, it's a bug-in-waiting. We learned this twice.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Build observability before you build features.
&lt;/h3&gt;

&lt;p&gt;Dashboards, alerts, trace context, the structured log — all of these come &lt;em&gt;before&lt;/em&gt; you ship the cool feature you're excited about. A clever new permission model that you can't observe is worse than a boring permission model you can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd build next
&lt;/h2&gt;

&lt;p&gt;A few things on our list that didn't fit this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NGINX otel module compiled in.&lt;/strong&gt; Right now NGINX traces are limited; the Auth Service has full spans, but the NGINX hop is a black box from the trace's point of view. Worth fixing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-tenant rate limiting.&lt;/strong&gt; Currently we rely on upstream services. The gateway is the natural place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAF integration.&lt;/strong&gt; We have an external WAF. Closer integration so WAF events show up in &lt;code&gt;AUTH_DECISION&lt;/code&gt; would help triage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token introspection cache.&lt;/strong&gt; Some integrations issue opaque tokens that we have to introspect with the issuer. Caching that lookup is its own caching problem; we haven't tackled it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A formal "tenancy contract" page.&lt;/strong&gt; Yes, the same one I told you to build on day one. We're catching up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these is a future series, probably.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final architecture
&lt;/h2&gt;

&lt;p&gt;For posterity, the picture of where we ended up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcoi62l7pj0x3ypscgj8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcoi62l7pj0x3ypscgj8.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything in this picture has been earned by an outage, a postmortem, or a near-miss. None of it is decoration. If you're building something similar and one of the boxes seems extra to you, it's because you haven't had the incident that justifies it yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Centralizing auth at the edge is one of those decisions that looks obviously correct in hindsight and is genuinely hard to convince a team to invest in beforehand. The wins are diffuse — slightly less drift, slightly fewer CVEs, slightly faster security responses. The pain is concentrated and visible — one new service to operate, one extra hop, one more place that has to be HA.&lt;/p&gt;

&lt;p&gt;But every six months we look back and the gateway has paid for itself again. A library upgrade we did once instead of thirty times. A revocation feature that shipped in a week instead of being negotiated across teams. A multi-tenant isolation guarantee we can actually defend in audits.&lt;/p&gt;

&lt;p&gt;If you take one thing from this series, take this: &lt;strong&gt;&lt;code&gt;auth&lt;/code&gt; is not a problem you solve once and ignore. It's a problem you solve &lt;em&gt;somewhere&lt;/em&gt;, well, and operate with care.&lt;/strong&gt; Pick that &lt;em&gt;somewhere&lt;/em&gt; to be the edge, build it small and observable, and the rest of your platform gets to focus on actual product work.&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you build one of these — or are stuck somewhere mid-build — drop a comment. The hardest part of operating an Auth Gateway is realizing that other people have built the same thing and hit the same rocks. There's no reason for each team to find them independently.&lt;/p&gt;

</description>
      <category>lessonslearned</category>
      <category>architecture</category>
      <category>platformengineering</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Part 9 — Operating the gateway: logs, traces, health, and degraded mode</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:42:27 +0000</pubDate>
      <link>https://forem.com/akarshan/operating-the-gateway-logs-traces-health-and-degraded-mode-2209</link>
      <guid>https://forem.com/akarshan/operating-the-gateway-logs-traces-health-and-degraded-mode-2209</guid>
      <description>&lt;p&gt;The first eight chapters of this series have been about &lt;em&gt;building&lt;/em&gt; an Auth Gateway. This one is about &lt;em&gt;living&lt;/em&gt; with one.&lt;/p&gt;

&lt;p&gt;A gateway in front of every authenticated request is a force multiplier — for both your platform and any oncall page. If something is broken, it's broken everywhere at once. So observability isn't a Chapter 9 thing. It's a Chapter 0 thing. We just describe it last because there's enough mechanism to talk about that you need the rest of the series first.&lt;/p&gt;

&lt;p&gt;This chapter covers the four things you need to be able to do at 3 AM:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read a single log line and understand what happened.&lt;/li&gt;
&lt;li&gt;Trace a slow request from edge to upstream.&lt;/li&gt;
&lt;li&gt;Tell whether a pod is alive, ready, or in deep trouble.&lt;/li&gt;
&lt;li&gt;Get an alert &lt;em&gt;once&lt;/em&gt; — not once per pod per second — when something degrades.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The log line
&lt;/h2&gt;

&lt;p&gt;There are exactly two structured log lines per protected request: one from NGINX, one from the Auth Service. They share &lt;code&gt;request_id&lt;/code&gt;, so you can join them.&lt;/p&gt;

&lt;h3&gt;
  
  
  The NGINX line
&lt;/h3&gt;

&lt;p&gt;Every request is logged as JSON to stdout via NGINX's &lt;code&gt;log_format main&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;log_format&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt; &lt;span class="s"&gt;escape=json&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;"logType":"NGINX_LOGS",&lt;/span&gt;
  &lt;span class="s"&gt;"request_id":"&lt;/span&gt;&lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"time_local":"&lt;/span&gt;&lt;span class="nv"&gt;$time_iso8601&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"remote_addr":"&lt;/span&gt;&lt;span class="nv"&gt;$remote_addr&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"request_method":"&lt;/span&gt;&lt;span class="nv"&gt;$request_method&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"request_uri":"&lt;/span&gt;&lt;span class="nv"&gt;$request_uri&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"request_path":"&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"slug":"&lt;/span&gt;&lt;span class="nv"&gt;$location_path&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"product":"&lt;/span&gt;&lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"microservice":"&lt;/span&gt;&lt;span class="nv"&gt;$microservice&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"status":"&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"status_class":"&lt;/span&gt;&lt;span class="nv"&gt;$status_class&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"request_time_ms":"&lt;/span&gt;&lt;span class="nv"&gt;$request_time_millis&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"service_request_time_ms":"&lt;/span&gt;&lt;span class="nv"&gt;$upstream_response_time_millis&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"service_connect_time":"&lt;/span&gt;&lt;span class="nv"&gt;$upstream_connect_time&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"auth_request_time_ms":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_time_millis&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"auth_connect_time":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_connect_time&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"body_bytes_sent":"&lt;/span&gt;&lt;span class="nv"&gt;$body_bytes_sent&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"http_referer":"&lt;/span&gt;&lt;span class="nv"&gt;$http_referer&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"http_user_agent":"&lt;/span&gt;&lt;span class="nv"&gt;$http_user_agent&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"http_x_forwarded_for":"&lt;/span&gt;&lt;span class="nv"&gt;$http_x_forwarded_for&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"http_host":"&lt;/span&gt;&lt;span class="nv"&gt;$http_host&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"tenant_id":"&lt;/span&gt;&lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"tenant_namespace":"&lt;/span&gt;&lt;span class="nv"&gt;$tenant_namespace&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"identity_id":"&lt;/span&gt;&lt;span class="nv"&gt;$identity_id&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"identity_type":"&lt;/span&gt;&lt;span class="nv"&gt;$identity_type&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"auth_error_message":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_message&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;
  &lt;span class="s"&gt;"auth_error_code":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few fields that matter more than they look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;auth_request_time_ms&lt;/code&gt;&lt;/strong&gt; — how long the auth subrequest took. We graph this. We page on the p99 going above 50 ms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;service_request_time_ms&lt;/code&gt;&lt;/strong&gt; — how long the upstream took, &lt;em&gt;excluding&lt;/em&gt; the auth subrequest. Sequential, not overlapping (Chapter 2).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;status_class&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;1xx&lt;/code&gt;/&lt;code&gt;2xx&lt;/code&gt;/&lt;code&gt;3xx&lt;/code&gt;/&lt;code&gt;4xx&lt;/code&gt;/&lt;code&gt;5xx&lt;/code&gt;. Faster than parsing &lt;code&gt;status&lt;/code&gt; for dashboard breakdowns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;tenant_id&lt;/code&gt;&lt;/strong&gt; — the resolved tenant. Always grep by tenant first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;auth_error_code&lt;/code&gt;&lt;/strong&gt; + &lt;strong&gt;&lt;code&gt;auth_error_message&lt;/code&gt;&lt;/strong&gt; — populated on deny. Empty on allow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Two extra log formats for the fail paths
&lt;/h3&gt;

&lt;p&gt;When NGINX hits &lt;code&gt;@auth_unavailable&lt;/code&gt; or &lt;code&gt;@upstream_unavailable&lt;/code&gt;, we log to a different format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;log_format&lt;/span&gt; &lt;span class="s"&gt;auth_unavailable&lt;/span&gt;     &lt;span class="s"&gt;'...auth-specific&lt;/span&gt; &lt;span class="s"&gt;fields...'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;log_format&lt;/span&gt; &lt;span class="s"&gt;upstream_unavailable&lt;/span&gt; &lt;span class="s"&gt;'...upstream-specific&lt;/span&gt; &lt;span class="s"&gt;fields...'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="s"&gt;@auth_unavailable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;access_log&lt;/span&gt; &lt;span class="n"&gt;/dev/stdout&lt;/span&gt; &lt;span class="s"&gt;auth_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason: when something is on fire, you want it isolated in its own log stream. Dashboards built off &lt;code&gt;main&lt;/code&gt; get drowned by 200s; a dashboard against &lt;code&gt;auth_unavailable&lt;/code&gt; shows you exactly the broken bucket without filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Auth Service line
&lt;/h3&gt;

&lt;p&gt;The Auth Service emits exactly one &lt;code&gt;AUTH_DECISION&lt;/code&gt; log per request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-05-01T12:34:56.789Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"logger"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"AUTH_DECISION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"request_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"4bf92f3577b34da6a3ce929d0e0e4736"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"span_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"00f067aa0ba902b7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"/api/v1/users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"user-service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"mt_prod"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"endpoint_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ACCESS_CONTROLLED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"identity_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"USER"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"identity_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"user@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auth_method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"bearer_token"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ACCESS_LEVEL_MATCH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authn_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authz_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jwt_cache_hit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bitmap_authz_used"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"granted_access_levels"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"product:admin"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"token_revoked"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"slow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the single most important artifact in the whole gateway. It is, in dry terms, our auth audit log. In practical terms it's the thing oncall greps when anything goes weird.&lt;/p&gt;

&lt;p&gt;Design rules we apply to it religiously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exactly one line per request.&lt;/strong&gt; No "starting auth", "authenticated", "authorizing", "decided" — those make the storyline split across N entries that you have to stitch back together. One line, one decision, every field.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every field, every time.&lt;/strong&gt; If a field doesn't apply (e.g., &lt;code&gt;bitmap_authz_used&lt;/code&gt; for an OPEN endpoint), it's &lt;code&gt;false&lt;/code&gt; or empty, not omitted. Optional fields make ad-hoc queries painful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;decision_reason&lt;/code&gt; is enum-only.&lt;/strong&gt; Free-form strings here would be the death of dashboards. New reasons require code review (Chapter 3).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace IDs are present.&lt;/strong&gt; &lt;code&gt;trace_id&lt;/code&gt; and &lt;code&gt;span_id&lt;/code&gt; are pulled from the OpenTelemetry context, so the log line stitches to traces in our backend without join logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Joining NGINX and Auth Service
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;request_id&lt;/code&gt; is generated by NGINX (&lt;code&gt;$request_id&lt;/code&gt;) and forwarded to the Auth Service via the subrequest header &lt;code&gt;X-Request-ID&lt;/code&gt;. Both log lines carry it. A typical investigation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. user reports 401 at 12:34:56 UTC
2. grep tenant_id="mt_prod" identity_id="..." in NGINX logs around the time
3. capture request_id
4. grep that request_id in Auth Service logs
5. read decision_reason — full story
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole graph of "client → ingress → NGINX → Auth Service → upstream" stitches back together by &lt;code&gt;request_id&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracing
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry instrumentation runs across both NGINX (via the otel module if compiled in; we're tracking that as a future improvement) and the Auth Service. The Auth Service span looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SPAN: pth-auth-service POST /auth
├── ATTRIBUTES
│   ├── http.route = "/auth"
│   ├── auth.tenant_id = "mt_prod"
│   ├── auth.endpoint_type = "ACCESS_CONTROLLED"
│   ├── auth.outcome = "allow"
│   └── auth.decision_reason = "ACCESS_LEVEL_MATCH"
├── EVENTS
│   ├── jwt.cache.hit
│   ├── route.cache.hit
│   └── bitmap.match
└── DURATION 2.5ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trace context propagates to upstream services via standard W3C trace headers. So a single trace shows: client → ingress → NGINX (eventually, via otel module) → Auth Service span → upstream service span(s) → DB calls inside the upstream. The whole story.&lt;/p&gt;

&lt;p&gt;We &lt;em&gt;don't&lt;/em&gt; turn on full sampling. 1% sampling at edges, 100% sampling for spans tagged &lt;code&gt;auth.outcome="deny"&lt;/code&gt;. The deny path is where the interesting investigations happen; sampling it fully gives us forensic detail without exploding storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Health probes
&lt;/h2&gt;

&lt;p&gt;Three K8s probe endpoints, each with a different purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/livez&lt;/code&gt; — process is alive
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Liveness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returns 200, always. The contract: as long as this handler runs, the process isn't deadlocked. K8s only kills the pod if the request times out (handler doesn't run at all).&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;/livez&lt;/code&gt; does &lt;em&gt;not&lt;/em&gt; check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It does not check the trie.&lt;/li&gt;
&lt;li&gt;It does not check Redis.&lt;/li&gt;
&lt;li&gt;It does not check Postgres.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is intentional. A pod whose Redis connection died can still serve cache-hot requests correctly. Killing it on a Redis outage is exactly the wrong thing to do — you turn a cache-hit-100%-but-Redis-down state into a cluster-wide rolling restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/readyz&lt;/code&gt; — pod can take traffic
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Readiness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;trieLoaded&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"trie not loaded"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;revocationExpected&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;revocationServiceReady&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"revocation cache not ready"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two gates:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Trie loaded.&lt;/strong&gt; Without it, we can't classify any endpoint. The pod is useless until the trie is in memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Revocation cache ready&lt;/strong&gt; (if revocation is enabled). Without it, fail-closed designs would deny everything; fail-open designs would miss every revocation. Either way, not ready.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notably &lt;em&gt;not&lt;/em&gt; gated on Redis health: a pod that lost Redis after readiness flips green stays ready. Refreshes fail loudly via Slack, but live traffic isn't disrupted.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/healthz&lt;/code&gt; — deep health
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Health&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pgErr&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pingPostgres&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;rdErr&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pingRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pgErr&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;rdErr&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"degraded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"postgres"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;errString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pgErr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s"&gt;"redis"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;errString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rdErr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one &lt;em&gt;does&lt;/em&gt; depend on Redis and Postgres. It's not used by Kubernetes — it's used by external monitoring (Pingdom, status pages). The distinction matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;K8s probes determine traffic routing.&lt;/strong&gt; They should be tolerant — every false-positive failure pulls a pod out of service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring probes determine alerting.&lt;/strong&gt; They should be strict — they tell humans something is wrong, not the load balancer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common mistake is wiring &lt;code&gt;/healthz&lt;/code&gt; to &lt;code&gt;readinessProbe&lt;/code&gt;. Don't. You will pull pods out of the rotation on a transient Redis blip and convert a degraded state into an outage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stateDiagram-v2
    [*] --&amp;gt; Booting
    Booting --&amp;gt; NotReady: trie empty
    NotReady --&amp;gt; Ready: trie loaded AND&amp;lt;br/&amp;gt;(revocation disabled OR revocation cache warm)
    Ready --&amp;gt; Degraded: Redis stream XREAD failure
    Degraded --&amp;gt; Ready: reconnect
    Ready --&amp;gt; Live: /livez always 200
    Ready --&amp;gt; DeepCheck: /healthz pings PG + Redis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Degraded mode
&lt;/h2&gt;

&lt;p&gt;The gateway is built to tolerate three specific kinds of trouble:&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis is slow or down
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Revocation stream consumer fails. &lt;code&gt;streamDegraded&lt;/code&gt; atomic flips true. Slack alert fires &lt;em&gt;once&lt;/em&gt;. Hot path keeps working — local cache is in memory.&lt;/li&gt;
&lt;li&gt;Pub/Sub subscriber reconnects in the background. On reconnect, the subscriber re-subscribes. The periodic cleanup goroutine resyncs the ZSET when it next runs.&lt;/li&gt;
&lt;li&gt;SA version sync fails. Local SA version map is unchanged. Tokens continue to validate against the last known versions until the next successful sync. Slack alert fires.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Postgres is slow or down
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Trie reload fails. Existing trie remains in memory. Slack alert. Hot path is unaffected.&lt;/li&gt;
&lt;li&gt;New pods cannot start (initial trie load blocks readiness). Existing pods serve.&lt;/li&gt;
&lt;li&gt;This is a &lt;em&gt;partial&lt;/em&gt; outage: scaling up is broken, current capacity still works.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Slack is slow or down
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Alerts are fire-and-forget goroutines. We don't block the hot path on Slack.&lt;/li&gt;
&lt;li&gt;If Slack is down, alerts are queued in goroutines for a configured timeout (5s) then dropped. We don't retry forever and OOM the pod.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The alert tree
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    Boot[startup] --&amp;gt; A1{revocation Redis OK?}
    A1 --&amp;gt;|no| Alert1[Slack: TokenRevocationService Redis client not initialized&amp;lt;br/&amp;gt;readyz=503]
    Run[runtime] --&amp;gt; A2{XREAD error?}
    A2 --&amp;gt;|yes| Alert2[Slack: stream degraded once]
    A2 --&amp;gt;|recover| Alert3[Slack: stream recovered]
    Run --&amp;gt; A3{localCache size &amp;gt; MAX?}
    A3 --&amp;gt;|yes| Alert4[Slack: cache overflow once&amp;lt;br/&amp;gt;fall back to ZSCORE]
    Run --&amp;gt; A4{RSA key missing for tenant?}
    A4 --&amp;gt;|yes| Alert5[Slack: per-tenant dedup]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Slack-alerter pattern
&lt;/h2&gt;

&lt;p&gt;Three rules we apply rigorously to alert code, because we learned them the hard way:&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 1: Atomic-bool dedup per state transition
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;degradeFlag&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bool&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;degradeFlag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;MarkDegraded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Swap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;slack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"auth.degraded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;degradeFlag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;MarkRecovered&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Swap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;slack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"auth.recovered"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Swap&lt;/code&gt; returns the &lt;em&gt;previous&lt;/em&gt; value. If it was already &lt;code&gt;true&lt;/code&gt;, we don't re-alert. We alert once on the transition, and once on the recovery.&lt;/p&gt;

&lt;p&gt;Without this, a Redis outage produces &lt;em&gt;thousands of Slack messages per pod per minute&lt;/em&gt;. The first time it happened, oncall threw their phone across the room.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 2: Per-(pod × cause) dedup, not per-cause
&lt;/h3&gt;

&lt;p&gt;A 100-pod deployment hitting the same RSA key misconfiguration alerts 100 times. That's correct: each pod is a separate runtime, each could have its own state, each is a separate alert source. We tag every alert with the pod's hostname so you can see in Slack whether it's a single-pod or fleet-wide problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 3: Deployment-tag prefix
&lt;/h3&gt;

&lt;p&gt;Every alert is prefixed with &lt;code&gt;[customer-env]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[acme-prod] auth.degraded TokenRevocationService Redis stream degraded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes a single Slack channel viable for many environments. Without the prefix, you can't tell at a glance whether the alert is from staging or prod.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we &lt;em&gt;don't&lt;/em&gt; alert on
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Individual auth failures. Login throttling, expired tokens, denied requests — those are normal and high-volume. They're in dashboards, not Slack.&lt;/li&gt;
&lt;li&gt;High latency on a single request. Latency alerts go on rolling p99, not on individual outliers.&lt;/li&gt;
&lt;li&gt;Anything below "the gateway behaves correctly but degraded." If the system self-heals quietly, we don't page humans.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  NGINX-specific operations
&lt;/h2&gt;

&lt;p&gt;A few NGINX-isms worth calling out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Graceful shutdown
&lt;/h3&gt;

&lt;p&gt;The chart's deployment does this on &lt;code&gt;preStop&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;preStop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/bin/sh&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;echo "[preStop] draining 15 seconds..."&lt;/span&gt;
          &lt;span class="s"&gt;sleep 15&lt;/span&gt;
          &lt;span class="s"&gt;echo "[preStop] nginx -s quit..."&lt;/span&gt;
          &lt;span class="s"&gt;nginx -s quit&lt;/span&gt;
          &lt;span class="s"&gt;while pgrep -x nginx &amp;gt; /dev/null; do sleep 1; done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;15 seconds of drain, then &lt;code&gt;nginx -s quit&lt;/code&gt; (graceful — drain in-flight requests, &lt;em&gt;then&lt;/em&gt; exit), then wait for all worker processes to finish. Combined with &lt;code&gt;terminationGracePeriodSeconds: 60&lt;/code&gt;, we have ~60 seconds total budget for clean shutdown. Without this, rolling deploys produced visible spikes of 502s in client logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker tuning
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;worker_processes&lt;/span&gt;  &lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;worker_rlimit_nofile&lt;/span&gt; &lt;span class="mi"&gt;65535&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;worker_shutdown_timeout&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;events&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;worker_connections&lt;/span&gt; &lt;span class="mi"&gt;10240&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="s"&gt;epoll&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;multi_accept&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;accept_mutex&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;worker_processes auto&lt;/code&gt; scales to CPU count. &lt;code&gt;accept_mutex off&lt;/code&gt; is the modern default — let kernel &lt;code&gt;epoll&lt;/code&gt; handle accept distribution. &lt;code&gt;worker_connections 10240&lt;/code&gt; is per worker, so a 4-core pod handles ~40k concurrent connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upstream keepalive
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;upstream&lt;/span&gt; &lt;span class="s"&gt;auth_service&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="s"&gt;auth-service-golang.&amp;lt;ns&amp;gt;.svc.cluster.local:80&lt;/span&gt;
         &lt;span class="s"&gt;max_fails=3&lt;/span&gt; &lt;span class="s"&gt;fail_timeout=30s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;keepalive&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;keepalive_requests&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;keepalive_timeout&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;keepalive 64&lt;/code&gt; is &lt;em&gt;per worker&lt;/em&gt;. A 4-worker NGINX with 64 keepalive maintains 256 keepalive connections to the Auth Service. Without keepalive, every subrequest opens a fresh TCP connection — fatal at any real RPS. The first time we deployed without it, p99 auth time was 80 ms. With it: 3 ms.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;max_fails=3 fail_timeout=30s&lt;/code&gt; marks the upstream "down" after 3 consecutive failures and stops sending traffic for 30 seconds. Combined with the retry config in &lt;code&gt;auth.conf&lt;/code&gt;, this gives smooth failover when one auth pod is sick.&lt;/p&gt;

&lt;h2&gt;
  
  
  A picture of where the time goes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gantt
    title Single request as seen by NGINX
    dateFormat  X
    axisFormat %s ms
    section NGINX
    receive + route match : 0, 1
    auth subrequest       : 1, 5
    proxy upstream        : 6, 18
    log JSON              : 24, 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The auth subrequest is the smallest thing on the timeline. That's not by accident. Chapter 8's caches are the reason it stays small. Chapter 9 is what tells you when it stops staying small.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Chapter 10 is the retrospective: what we'd build differently if we did it from scratch, what tools we wish we'd added on day one, and the maturity progression from "auth library in every service" to "production-grade gateway." It's the chapter you write &lt;em&gt;after&lt;/em&gt; operating the thing for two years, and it's the one I'd want to read if I were starting from scratch.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>nginx</category>
      <category>kubernetes</category>
      <category>sre</category>
    </item>
    <item>
      <title>Part 8 — Making It Fast: Caching, Hot Paths, and Avoiding DB Calls</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:42:14 +0000</pubDate>
      <link>https://forem.com/akarshan/making-it-fast-caching-hot-paths-and-avoiding-db-calls-4bbh</link>
      <guid>https://forem.com/akarshan/making-it-fast-caching-hot-paths-and-avoiding-db-calls-4bbh</guid>
      <description>&lt;p&gt;The Auth Gateway sits in front of every authenticated request in the platform. Its latency isn't just its own latency — it's the floor for every service behind it. If auth takes 50ms, every request to every upstream service starts 50ms in the hole.&lt;/p&gt;

&lt;p&gt;Our internal target is sub-millisecond on cache-hot paths. The way we hit it isn't clever algorithms — it's a stack of small caches, each one handling a different kind of state, each invalidated through a different channel. This post walks through all of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The principle that shapes everything
&lt;/h2&gt;

&lt;p&gt;Before the individual layers: a rule we hold as policy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Redis is allowed to &lt;em&gt;influence&lt;/em&gt; the hot path. Redis is not allowed to &lt;em&gt;block&lt;/em&gt; it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every cache in the system is in-process. Redis feeds them asynchronously — pushing revocation events, triggering trie reloads, syncing SA versions. But a pod whose Redis connection is dead can still answer requests correctly, for the duration of its staleness window.&lt;/p&gt;

&lt;p&gt;That's the difference between "Redis is down, the platform is down" and "Redis is down, the platform is slightly stale." One is a severity-1 incident. The other is a degraded mode we can tolerate for minutes while someone fixes it.&lt;/p&gt;

&lt;p&gt;With that framing, here's how a warm request flows through the cache stack:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpphw3brffith73ndmjb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpphw3brffith73ndmjb.png" alt=" " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Six layers. Five are pure in-process memory. The sixth — revocation — is in-process too, but fed asynchronously from Redis. No layer blocks on a network call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: JWT verify cache
&lt;/h2&gt;

&lt;p&gt;The single biggest win in the stack. RSA signature verification is expensive — a few hundred microseconds per call — and at 50,000 RPS that cost is real.&lt;/p&gt;

&lt;p&gt;We wrap the entire decode-and-verify path in a Ristretto cache. The key is a 64-bit FNV hash of the raw token string; the value is the decoded JWT claims. On a cache hit, we skip RSA verification entirely.&lt;/p&gt;

&lt;p&gt;A few choices worth explaining:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Ristretto over a plain LRU.&lt;/strong&gt; Ristretto uses TinyLFU — it tracks access frequency and uses it to decide what to evict. Under burst traffic, a pure LRU can evict frequently-used tokens just because they weren't the &lt;em&gt;most recent&lt;/em&gt;. TinyLFU keeps the hot tokens and evicts the cold ones. The behavior under load is meaningfully better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why hash the token string.&lt;/strong&gt; Two reasons. Memory: a JWT is 500–2000 bytes; a uint64 is 8. And defense-in-depth: if the cache state ever ends up in a log or heap dump, the tokens themselves aren't exposed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why cap TTL at 30 seconds.&lt;/strong&gt; The cache stores the &lt;em&gt;decoded token&lt;/em&gt;, not the auth decision. Revocation is checked separately on every request. But capping TTL at 30 seconds keeps the staleness window honest — a token that's been revoked won't ride a warm cache entry for an hour.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2: RSA public key cache
&lt;/h2&gt;

&lt;p&gt;Per-tenant RSA public keys are loaded from environment config at boot. Parsing PEM is not free — a few hundred microseconds — and we don't want to pay it on every cache miss.&lt;/p&gt;

&lt;p&gt;We cache the parsed key per tenant using &lt;code&gt;sync.Once&lt;/code&gt;. The first request for a given tenant parses the key; every request after that gets the cached result, including if the first parse failed.&lt;/p&gt;

&lt;p&gt;Two operational details that matter:&lt;/p&gt;

&lt;p&gt;A misconfiguration fires a Slack alert &lt;em&gt;once per tenant per pod&lt;/em&gt;, not once per request. Without this guard, a single bad key config generates a Slack message for every request that hits that tenant, which during a deploy is thousands of messages in seconds.&lt;/p&gt;

&lt;p&gt;Key rotation requires a pod restart. We considered hot-reloading. We chose deploy-to-rotate — the operational simplicity of a predictable restart beats the complexity of a file watcher and the failure modes it introduces.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 3: route cache
&lt;/h2&gt;

&lt;p&gt;The trie lookup is already fast — O(depth), with depth typically 3–5 segments. But re-walking the same paths 50,000 times a second is wasteful. A TinyLFU cache sits in front of the trie, keyed by slug, HTTP method, and path.&lt;/p&gt;

&lt;p&gt;The platform has around 3,000 distinct route tuples in production. Sized at 10,000 entries, the cache fits the entire steady-state working set with room to spare. Misses are new endpoints, cold starts, and post-reload warm-up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invalidation is bulk.&lt;/strong&gt; On any trie reload — whether triggered by a periodic interval or a Redis Pub/Sub kick — we drop the entire route cache. We considered partial invalidation (only drop entries for changed slugs) and rejected it. Trie reloads are rare. The cache refills in milliseconds. The bookkeeping complexity of partial invalidation isn't worth the seconds of warm-up time it would save.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: the trie
&lt;/h2&gt;

&lt;p&gt;The trie is a cache too, just an unusual one. It's an in-memory mirror of the endpoint table from Postgres. No request ever touches Postgres on the hot path.&lt;/p&gt;

&lt;p&gt;Invalidation has two channels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Periodic&lt;/strong&gt;: every hour by default. A safety net.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Push&lt;/strong&gt;: via Redis Pub/Sub on &lt;code&gt;auth:trie:refresh&lt;/code&gt;. Admin tooling publishes this after any write to the endpoint table. Pods reload within milliseconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The push channel exists because endpoint changes are operationally significant. A new admin route that's meant to be protected shouldn't have a one-hour window where it's open because the trie hasn't refreshed. The push channel closes that window.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 5: policy bitmap snapshot
&lt;/h2&gt;

&lt;p&gt;The permission bitmap (covered in the previous chapter) is loaded alongside the trie. It's an in-memory structure mapping permission names to bit indexes, with a version number.&lt;/p&gt;

&lt;p&gt;The snapshot is never partially updated. It's swapped atomically — a background process builds a new snapshot when the registry changes, then stores it via an atomic pointer swap. Readers grab the pointer at the start of a request and work with that exact snapshot throughout. No locks, no torn reads.&lt;/p&gt;

&lt;p&gt;This pattern shows up repeatedly in the codebase: when state changes as a whole unit, an atomic pointer is simpler and faster than a read-write mutex around a map. It's worth internalizing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 6: revocation map and SA version map
&lt;/h2&gt;

&lt;p&gt;These were covered in depth in the previous chapter. In the context of the cache stack:&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;revocation map&lt;/strong&gt; is bounded at 50,000 JTIs, fed by a Redis Stream, and fails open — if a JTI isn't in the map, we treat it as not revoked. The staleness window is low single-digit seconds in steady state.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;SA version map&lt;/strong&gt; has the opposite posture: fail closed. If the map isn't ready, the pod doesn't pass readiness. If a service account token's version is behind the current version in the map, it's denied.&lt;/p&gt;

&lt;p&gt;Same underlying shape — in-memory map fed asynchronously from Redis — but different risk tolerance based on what's being protected.&lt;/p&gt;




&lt;h2&gt;
  
  
  How all the invalidation channels fit together
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfd02hlh7ch4yhpkeegs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfd02hlh7ch4yhpkeegs.png" alt=" " width="800" height="879"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three patterns across the stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TTL-based&lt;/strong&gt; (JWT verify cache). Simple, no coordination. Best when the cached value has a natural expiry built into it — which JWTs do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push-based&lt;/strong&gt; (trie, revocation stream, SA version). Required when a staleness window has real cost. Needs a degraded-mode plan for when the push channel is unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity-based eviction&lt;/strong&gt; (route cache, JWT cache). Bounded memory by design. What gets evicted matters more than when — which is why TinyLFU beats LRU for this workload.&lt;/p&gt;

&lt;p&gt;When in doubt, start with TTL. Push-based caches are powerful but bring failure modes — lost events, stalled consumers, cursor races. Use them only when a TTL window is genuinely unacceptable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cache we got wrong
&lt;/h2&gt;

&lt;p&gt;Our first JWT cache used a plain Go map with a mutex and a &lt;code&gt;time.AfterFunc&lt;/code&gt; per entry to handle expiry.&lt;/p&gt;

&lt;p&gt;It worked in tests. It fell over in production within a week. Two problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goroutine pressure.&lt;/strong&gt; Every cached token spawned a timer goroutine. At a million live tokens, the Go scheduler handled it — but GC pauses got ugly and unpredictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No cap.&lt;/strong&gt; There was no size limit. Memory grew until pods OOM-killed.&lt;/p&gt;

&lt;p&gt;Switching to Ristretto solved both: timers are amortized into a small internal worker, and &lt;code&gt;MaxCost&lt;/code&gt; enforces a hard ceiling.&lt;/p&gt;

&lt;p&gt;The lesson: a cache is a copy of state. If there's no mechanism to bound or invalidate it — TTL, push, or capacity — it's not a cache. It's a memory leak.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cold start vs. warm
&lt;/h2&gt;

&lt;p&gt;A pod's first requests are slower. The trie loads from Postgres before readiness flips — that's the only DB call in the pod's lifetime. After that, every lookup is in-memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn26nomvqff80sqslasvd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn26nomvqff80sqslasvd.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The JWT cache starts empty on a fresh deploy and fills up within seconds as real tokens come through. We don't pre-warm it — the cost of cold RSA verifications for a few seconds after a deploy is acceptable.&lt;/p&gt;

&lt;p&gt;The revocation cache we &lt;em&gt;do&lt;/em&gt; pre-warm, synchronously, before readiness. A pod that's marked ready must have the current revocation set. Otherwise it would fail-open on every request until its first Redis sync — meaning any logouts from the past hour would be invisible to it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to actually graph
&lt;/h2&gt;

&lt;p&gt;For each cache, the metrics that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hit rate&lt;/strong&gt; — the most important number. A cache with a stable size but falling hit rate is broken.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eviction rate&lt;/strong&gt; — meaningful only if the cache is bounded. High eviction with high hit rate is fine; it means the cache is doing its job under pressure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Size&lt;/strong&gt; — useful for capacity planning, not for alerting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The JWT verify cache runs at 95%+ hit rate in steady state. A fresh deploy drops it to zero and it climbs back within seconds. Anything else warrants investigation.&lt;/p&gt;

&lt;p&gt;Don't alert on cache size. Alert on hit rate.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: Chapter 9 — operating the gateway. The structured auth decision log, OpenTelemetry tracing, the three Kubernetes probes, degraded-mode behavior, and the Slack alert pattern that keeps on-call sane during a Redis outage.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>caching</category>
      <category>go</category>
      <category>redis</category>
    </item>
    <item>
      <title>Part 7 — Token Revocation Without Killing Performance</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:41:57 +0000</pubDate>
      <link>https://forem.com/akarshan/token-revocation-without-killing-performance-389d</link>
      <guid>https://forem.com/akarshan/token-revocation-without-killing-performance-389d</guid>
      <description>&lt;p&gt;JWTs have a hard problem hiding inside them: they're stateless. The whole point of a JWT is that the verifier can check a signature and make a decision — no database, no round-trip. That's what makes them fast. It's also what makes "log this user out &lt;em&gt;right now&lt;/em&gt;" not work out of the box.&lt;/p&gt;

&lt;p&gt;We had to solve this. Users log out. Admins disable accounts. Service accounts rotate. Each one of those events has to invalidate live tokens &lt;em&gt;immediately&lt;/em&gt;, not at the next expiry tick.&lt;/p&gt;

&lt;p&gt;This post is about how we did it without giving up the performance properties that made JWTs worth using in the first place.&lt;/p&gt;




&lt;h2&gt;
  
  
  The constraints that ruled out the obvious answers
&lt;/h2&gt;

&lt;p&gt;Three numbers shape the design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50,000 RPS&lt;/strong&gt; of authenticated requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-millisecond&lt;/strong&gt; auth budget on the hot path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-digit-second&lt;/strong&gt; propagation — when a user logs out, every pod must know within a few seconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The obvious approaches each break one of these:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query Redis on every request.&lt;/strong&gt; Adds a network round-trip to every auth decision. Median latency explodes. Redis also becomes a hard single point of failure — if it's slow or down, every request fails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push revocation events via websockets or long-poll to every pod.&lt;/strong&gt; Works at low scale. Gets fragile when pods churn, restart, or drop events during a network blip.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short-lived tokens with fast refresh.&lt;/strong&gt; A 5-minute expiry reduces the window, but doesn't close it — and 5 minutes is too long when an account is disabled for a security reason.&lt;/p&gt;

&lt;p&gt;What worked: a two-layer design.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; is the &lt;em&gt;propagation&lt;/em&gt; layer. It holds the authoritative revocation state and a live event feed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local memory&lt;/strong&gt; is the &lt;em&gt;decision&lt;/em&gt; layer. Each pod keeps an in-memory map of revoked JTIs. The hot-path check is a single map lookup — no I/O.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcomxz3ztl0aaevfjh6e2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcomxz3ztl0aaevfjh6e2.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Redis structures, one job each
&lt;/h2&gt;

&lt;p&gt;Two Redis keys do the heavy lifting, and they serve different purposes — which is why both are necessary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;revoked_access_tokens&lt;/code&gt; is a sorted set. Each member is a JTI; the score is the token's expiry timestamp. This is the &lt;strong&gt;source of truth at any point in time&lt;/strong&gt; — you can ask it "give me everything currently revoked" with a single range query.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;revoked_access_token_events&lt;/code&gt; is a stream. Each entry carries the JTI, expiry, and metadata about the revocation. This is the &lt;strong&gt;live feed&lt;/strong&gt; — pods subscribe to it and learn about new revocations as they happen.&lt;/p&gt;

&lt;p&gt;The ZSET answers "what is the state right now?" The Stream answers "what has changed since I last checked?" You need both because they're good at different things: the ZSET is for bulk reads at startup, the Stream is for incremental updates during steady state.&lt;/p&gt;




&lt;h2&gt;
  
  
  The startup problem — and two races hiding in the obvious solution
&lt;/h2&gt;

&lt;p&gt;When a pod boots, it needs to populate its local map before it serves traffic. The tempting approach: read the ZSET to get current revocations, then subscribe to the Stream for updates.&lt;/p&gt;

&lt;p&gt;Two races hide here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Race 1:&lt;/strong&gt; What if a revocation arrives between the ZSET read and the Stream subscription? The event is in the Stream, but the pod's cursor is positioned &lt;em&gt;after&lt;/em&gt; it. The JTI never makes it into the local cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Race 2:&lt;/strong&gt; What if you start the Stream consumer from the very beginning (&lt;code&gt;0-0&lt;/code&gt;) to avoid missing anything? Now you replay every event ever emitted — potentially thousands. Worse: if the stream has been trimmed, you'll silently miss events older than the trim window.&lt;/p&gt;

&lt;p&gt;The fix is to reverse the order: capture the Stream tip &lt;em&gt;before&lt;/em&gt; reading the ZSET, then start the consumer from that captured tip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TokenRevocationService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;WarmCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// 1. Capture the stream tip first.&lt;/span&gt;
    &lt;span class="n"&gt;tipID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;captureStreamTip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// 2. Read the current ZSET snapshot.&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unix&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;members&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZRangeByScoreWithScores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"revoked_access_tokens"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZRangeBy&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Min&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;strconv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FormatInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Max&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"+inf"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;localCache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Member&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// 3. Start the consumer at the tip captured before the ZSET read.&lt;/span&gt;
    &lt;span class="c"&gt;// Anything that arrived between tipID and now replays through the consumer.&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consumeStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tipID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The order matters precisely. By capturing the tip first, anything that arrives while we're reading the ZSET will replay through the consumer. Anything already in the ZSET when we read it is loaded directly. If the same JTI appears in both — a revocation that landed right on the boundary — setting the same map entry twice is harmless.&lt;/p&gt;

&lt;p&gt;The pod's lifecycle from boot to steady state:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1nlfcs66n8h4hrbsqmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1nlfcs66n8h4hrbsqmh.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The hot path: deliberately boring
&lt;/h2&gt;

&lt;p&gt;The actual check on every request is about as simple as it gets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TokenRevocationService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;IsJTIRevoked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jti&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RLock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;expiresAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;localCache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;jti&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUnlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unix&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;expiresAt&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A read lock, a map lookup, a comparison. No Redis, no network. Hundreds of nanoseconds.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;!found → false&lt;/code&gt; branch is a deliberate &lt;em&gt;fail-open&lt;/em&gt; choice: if a JTI isn't in the local cache, we treat it as not revoked. The risk is that a freshly revoked token might be accepted for the few seconds between the revocation being published and the local cache being updated. We accept that window. The alternative — failing closed — would mean denying every request whose JTI we haven't explicitly loaded, which at startup means denying all traffic until the cache is fully warm. That's worse.&lt;/p&gt;




&lt;h2&gt;
  
  
  The gap probe: catching what the Stream misses
&lt;/h2&gt;

&lt;p&gt;The Stream consumer keeps a cursor — the ID of the last event it processed. Periodically, the stream gets trimmed to bound its size. If the consumer's cursor falls behind the trim window (because of a slow handler, a GC pause, or a network blip), the next &lt;code&gt;XREAD&lt;/code&gt; will silently skip the trimmed events.&lt;/p&gt;

&lt;p&gt;We detect this with a gap probe that runs every 5 minutes:&lt;/p&gt;

&lt;p&gt;If the oldest event currently in the Stream is &lt;em&gt;newer&lt;/em&gt; than the consumer's cursor, we missed something. When that happens, we resync from the ZSET (which is the authoritative source of truth and doesn't get trimmed the same way) and snap the cursor to the stream tip.&lt;/p&gt;

&lt;p&gt;This probe has fired exactly twice in production since we added it — both times during planned Redis maintenance — and both times the recovery was automatic. The value isn't that it fires often. It's that without it, you'd never know you missed events at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service accounts: the same idea, different risk tolerance
&lt;/h2&gt;

&lt;p&gt;User token revocation fails open — a freshly revoked token might slip through for a few seconds. That's acceptable: the window is small, bounded, and observable.&lt;/p&gt;

&lt;p&gt;Service-account rotation fails &lt;em&gt;closed&lt;/em&gt;. When a service account is rotated, the old credentials must be denied immediately, even if that means a slightly degraded startup path.&lt;/p&gt;

&lt;p&gt;The mechanism is different too: instead of JTI revocation, service accounts carry a version number. The gateway keeps a local map of current SA versions loaded from Redis. If the token's version is less than the current version for that service account, it's denied.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6v73iq3eigxkbggpy4q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6v73iq3eigxkbggpy4q.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pod won't pass readiness until this cache is loaded. If Redis is unavailable at startup, the pod doesn't serve traffic. That's intentional — we'd rather have fewer pods than pods that can't correctly enforce SA rotation.&lt;/p&gt;

&lt;p&gt;The 60-second sync window is our exposure. We reduce the effective risk by having the rotating system hold the old version live for a grace period, only promoting the new version once enough gateways have synced.&lt;/p&gt;




&lt;h2&gt;
  
  
  What revocation doesn't do
&lt;/h2&gt;

&lt;p&gt;A few things that seem natural but aren't in scope:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revoke by user ID.&lt;/strong&gt; The cache is JTI-indexed. To revoke all of a user's tokens, the issuer enumerates their live JTIs and revokes each one. The Auth Service sees only individual JTIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-region propagation.&lt;/strong&gt; We run regional auth services with regional Redis instances. Revocations published in one region don't automatically appear in another. Most revocations are tenant-bound, and tenants are region-bound, so this rarely matters in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shared Redis.&lt;/strong&gt; This Redis instance is auth-only. The corner cases in revocation are complex enough that sharing infrastructure with rate limiters or session stores would make debugging much harder.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we'd do differently on day one
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Add the gap probe immediately.&lt;/strong&gt; It's a small amount of code and it's the difference between "we silently lose a logout event occasionally" and "we always know when propagation breaks."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test the warm path with a slow or unavailable Redis.&lt;/strong&gt; Most bugs we found were in error handling during startup, not steady-state operation. The warm path runs once per pod lifetime; staging rarely exercises it unless you deliberately inject failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bound everything from the start.&lt;/strong&gt; The local cache, the stream length, the sync interval. Unbounded growth in any of them becomes an incident.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: Chapter 8 — every cache in the hot path, together. JWT verify cache, RSA key cache, route cache, policy bitmap, revocation map, SA version map. Each one is fast individually; together they're how the gateway fits inside its latency budget. We'll cover TTL strategy, invalidation, and the one cache where we got eviction wrong.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>redis</category>
      <category>jwt</category>
      <category>performance</category>
      <category>go</category>
    </item>
    <item>
      <title>Part 6 — Authorization at Scale: Access Levels, Roles, and Compact Decisions</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:41:42 +0000</pubDate>
      <link>https://forem.com/akarshan/authorization-at-scale-access-levels-roles-and-compact-decisions-57ga</link>
      <guid>https://forem.com/akarshan/authorization-at-scale-access-levels-roles-and-compact-decisions-57ga</guid>
      <description>&lt;p&gt;Authentication answers "who are you?" Authorization answers the harder question: "are you allowed to do this?"&lt;/p&gt;

&lt;p&gt;By the time a request reaches this stage, we've already validated the token and confirmed the tenant. Now we need to decide — before the request touches any upstream service — whether this specific identity has permission to call this specific endpoint. That decision runs hundreds of millions of times a day. It needs to be fast, correct, and cheap to reason about when something goes wrong.&lt;/p&gt;

&lt;p&gt;This post is about the model we use, the simpler approach that served us for a year, and the optimization we eventually built — and why we kept the old path around anyway.&lt;/p&gt;




&lt;h2&gt;
  
  
  The model: three layers, one question
&lt;/h2&gt;

&lt;p&gt;Our authorization model has three layers:&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;role&lt;/strong&gt; is what a user is granted — something like &lt;code&gt;clinic_admin&lt;/code&gt; or &lt;code&gt;billing_specialist&lt;/code&gt;. An &lt;strong&gt;access level&lt;/strong&gt; is a coarse permission — something like &lt;code&gt;user:admin&lt;/code&gt; or &lt;code&gt;schedule:write&lt;/code&gt;. An &lt;strong&gt;endpoint&lt;/strong&gt; declares which access levels are sufficient to reach it.&lt;/p&gt;

&lt;p&gt;Roles are bags of access levels. Endpoints are protected by lists of access levels. A user can call an endpoint if any of their roles' access levels appear in the endpoint's required list. That's the whole model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hm0cla1frifq7ocjye0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hm0cla1frifq7ocjye0.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We deliberately keep it coarse. There are dozens of access levels in the system, not thousands. Questions like "can this user delete &lt;em&gt;this specific patient record&lt;/em&gt;?" belong to the upstream service that owns that data — it has the context the gateway doesn't. The gateway's job is to filter on the order of "is this user even an admin at all?" — a check that catches the vast majority of misuse and runs at edge speeds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who defines access levels and endpoints?
&lt;/h2&gt;

&lt;p&gt;Here's something that might seem surprising: the gateway doesn't own the access level definitions. &lt;strong&gt;Individual product services do.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each service ships a &lt;code&gt;access_levels.json&lt;/code&gt; alongside its code. This file declares what access levels it recognizes and which endpoints require which levels. A scheduling service owns &lt;code&gt;schedule:write&lt;/code&gt;. A billing service owns &lt;code&gt;billing:read&lt;/code&gt;. The gateway is a consumer — it doesn't make editorial decisions about what permissions mean.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;access_levels.json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;owned&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;maintained&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;upstream&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;service&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"access_levels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schedule:write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Create and modify appointments"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schedule:read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"View appointments"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"endpoints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/appointments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"POST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"requires"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"schedule:write"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/appointments/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"requires"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"schedule:read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schedule:write"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The publish flow runs through CI/CD. When a service merges a change to its access level definitions, a pipeline step pushes the updated file to a well-known S3 path. The gateway picks up the change on its next refresh cycle — no gateway deploy required, no manual registry edits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;service-repo/
  access_levels.json   ← owned by the service team
  .github/workflows/publish.yml

# publish.yml (simplified)
- name: Publish access levels
  run: |
    aws s3 cp access_levels.json \
      s3://registry/services/${{ env.SERVICE_NAME }}/access_levels.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps ownership aligned: the team that builds the feature decides what permission protects it. The gateway team owns the &lt;em&gt;mechanism&lt;/em&gt;; product teams own the &lt;em&gt;policy&lt;/em&gt;. Changes are auditable through git history, reviewable via pull request, and rollbackable the same way code is.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the token carries — and what it doesn't
&lt;/h2&gt;

&lt;p&gt;The JWT includes the user's granted access levels as a bitmap — a compact byte slice — along with the version of the registry used when the token was issued. It does &lt;em&gt;not&lt;/em&gt; contain the full permission graph, and it does not contain endpoint requirements. Those live in the database, loaded into memory at boot.&lt;/p&gt;

&lt;p&gt;A decoded JWT payload looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tenant"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-health"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_bitmap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"__________________8H"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_bitmap_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;114&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1714000000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;policy_bitmap&lt;/code&gt; is a base64url-encoded byte slice — each bit position corresponds to one access level in the registry at version &lt;code&gt;114&lt;/code&gt;. &lt;code&gt;policy_bitmap_version&lt;/code&gt; tells the gateway exactly which registry snapshot to use when interpreting the bits. If the gateway's current registry is at version 114, it uses the fast bitmap path. If the versions differ, it falls back to string matching (more on that below).&lt;/p&gt;

&lt;p&gt;This is a deliberate tradeoff. The stateless alternative — put everything in the token, make every decision without a database — sounds clean until users accumulate permissions. Tokens balloon to 4–8 KB. Cookies start failing at network edges. Mobile clients cache tokens aggressively and get stuck with stale permission sets. Every role change requires re-issuing every affected token immediately.&lt;/p&gt;

&lt;p&gt;The compromise: the JWT carries &lt;em&gt;coarse access levels&lt;/em&gt; (a small, stable set encoded as a bitmap), and the database carries &lt;em&gt;endpoint requirements&lt;/em&gt; (queried once at startup, refreshed on demand). Per-request authorization is a fast in-memory lookup on both sides.&lt;/p&gt;

&lt;p&gt;The payoff on token size is significant. Before the bitmap rework, heavily-permissioned admins had tokens approaching 3 KB. After:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fytx9d69ueqgcg5jl3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fytx9d69ueqgcg5jl3d.png" alt=" " width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The original approach: string matching
&lt;/h2&gt;

&lt;p&gt;The first implementation is what you'd sketch on a whiteboard. Take the user's access levels, take the endpoint's required access levels, check if they overlap.&lt;/p&gt;

&lt;p&gt;It's &lt;code&gt;O(n + m)&lt;/code&gt; — linear in the number of user permissions and required permissions. With typical values (a user might have 20–80 access levels, an endpoint usually requires 1–3), this runs in nanoseconds. It's correct, it's readable, and it worked fine in production for over a year.&lt;/p&gt;

&lt;p&gt;The reason we eventually replaced it had nothing to do with speed.&lt;/p&gt;

&lt;p&gt;The first reason was &lt;strong&gt;token size&lt;/strong&gt;. As the platform grew and senior users accumulated more access levels, tokens stretched. We had admins with tokens approaching 3 KB. That's uncomfortable but manageable — until it isn't.&lt;/p&gt;

&lt;p&gt;The second reason was &lt;strong&gt;density of signal&lt;/strong&gt;. String matching tells you &lt;em&gt;that&lt;/em&gt; the user was authorized, but the log entry just says &lt;code&gt;granted: ["user:admin"]&lt;/code&gt;. We wanted richer per-permission metrics — which access levels are actually being exercised, which ones are granted but never hit anything — without adding another pass over the data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bitmap approach: compress the representation, keep the logic
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jburp4a0hiw10dpseif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jburp4a0hiw10dpseif.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The idea is simple: assign every access level a stable integer index. Represent a user's granted permissions as a bit vector — one bit per access level. Represent each endpoint's requirements the same way. Authorization becomes a bitwise AND.&lt;/p&gt;

&lt;p&gt;If the result is nonzero, the user has at least one of the required permissions. Allow. If the result is zero, deny. That's the entire hot path.&lt;/p&gt;

&lt;p&gt;The anchoring snippet — the intersection check at the core of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;intersects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a typical 32-byte bitmap (covering 256 possible access levels), this is a handful of CPU instructions. Decision time dropped from ~3 microseconds in the worst legacy case to under 200 nanoseconds. Not visible to end users. Very visible in CPU costs at 50,000 requests per second.&lt;/p&gt;

&lt;p&gt;Token size dropped too — from ~3 KB for a heavily-permissioned admin to under 1 KB. The access levels that used to be a long string array became the &lt;code&gt;policy_bitmap&lt;/code&gt; field: a base64url-encoded byte slice.&lt;/p&gt;

&lt;p&gt;The two paths side by side:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqupwdqmu5xzwgxivnwua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqupwdqmu5xzwgxivnwua.png" alt=" " width="800" height="843"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The version problem — and why we kept the old path
&lt;/h2&gt;

&lt;p&gt;Here's the catch: bit indexes have to be stable. If access level &lt;code&gt;user:admin&lt;/code&gt; is bit 0 today, it must still be bit 0 when old tokens are being validated. This is managed through a versioned registry — each snapshot of the bit assignments carries a version number, and the JWT records which version was used when it was issued via &lt;code&gt;policy_bitmap_version&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_bitmap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"__________________8H"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_bitmap_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;114&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the gateway boots, it loads the current registry — say, version 114 — and builds an in-memory lookup from version number to bit-index map. When a token arrives, the gateway reads &lt;code&gt;policy_bitmap_version&lt;/code&gt; and checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Version matches current registry (114 == 114):&lt;/strong&gt; decode the bitmap, run &lt;code&gt;intersects()&lt;/code&gt;, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version is older (e.g., 112):&lt;/strong&gt; fall back to string matching against the access level names.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No &lt;code&gt;policy_bitmap_version&lt;/code&gt; field:&lt;/strong&gt; legacy token predating the bitmap feature — fall back to string matching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fallback uses the access level names embedded in the token (carried as a separate claim for exactly this purpose) and checks them against the endpoint's required list. Same outcome, no bitmap needed.&lt;/p&gt;

&lt;p&gt;This fallback isn't a temporary measure. It's load-bearing. Long-lived service account tokens might be weeks old. We can't deny them just because they predate a registry update. The string-based check is version-agnostic: it doesn't care about bit indexes at all. As long as both sides agree on what the access level &lt;em&gt;strings&lt;/em&gt; mean, it works.&lt;/p&gt;

&lt;p&gt;New registry versions are created whenever a service publishes new access level definitions through the CI/CD pipeline. The version number increments, new bit positions are assigned to new access levels, and existing assignments are preserved verbatim. Old tokens stay valid — they just take the slightly slower path until they expire naturally.&lt;/p&gt;

&lt;p&gt;We track the fallback rate with a metric. When it's near zero, things are healthy. A spike tells us something is wrong — maybe a token issuer is behind on registry versions, maybe a test fixture has stale data, or maybe a new service published access levels without updating the issuer to match.&lt;/p&gt;




&lt;h2&gt;
  
  
  A few things we deliberately didn't do
&lt;/h2&gt;

&lt;p&gt;Some approaches we considered and rejected:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caching the authorization decision.&lt;/strong&gt; "Same token plus same endpoint equals same answer" feels right. It's wrong — role changes, revocation, and tenant changes all invalidate it. We cache the &lt;em&gt;token decode result&lt;/em&gt; (the identity and access levels), not the &lt;em&gt;decision&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-tenant access level definitions.&lt;/strong&gt; Letting each tenant define what &lt;code&gt;user:admin&lt;/code&gt; means sounds flexible. In practice, it means the registry forks and every cross-tenant reasoning breaks. Access levels are platform-wide; role assignments are per-tenant. That's the line. Individual services define access levels globally — they don't get per-tenant variants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hierarchical permissions.&lt;/strong&gt; "user:admin implies user:read" is elegant on paper. It complicates bitmap encoding and makes rollback harder. We grant both explicitly. A few extra access levels per role is not a real cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A central registry team as the bottleneck.&lt;/strong&gt; Early on, a single team owned all access level definitions. This created a queue — every new feature needed a registry PR to land before it could ship. Moving ownership to service teams via the CI/CD publish flow eliminated that queue entirely. The gateway team reviews the mechanism; service teams review each other's access level semantics in their own PRs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern underneath the optimization
&lt;/h2&gt;

&lt;p&gt;The bitmap is a performance and density win. But the deeper idea is the same one from the last chapter: make the implicit explicit and keep the decision structure visible.&lt;/p&gt;

&lt;p&gt;String matching and bitmap intersection both produce the same outcome — allow or deny. What the bitmap adds isn't correctness, it's &lt;em&gt;compactness&lt;/em&gt;: a cheaper wire representation, a faster runtime check, and a version-aware fallback that degrades gracefully instead of breaking.&lt;/p&gt;

&lt;p&gt;The CI/CD publish flow adds a different kind of compactness: it removes the coordination overhead of centralized registry management. Services declare what they need. The pipeline handles the distribution. The gateway consumes whatever's in the registry. No tickets, no handoffs.&lt;/p&gt;

&lt;p&gt;The fallback is worth lingering on. Most optimizations in auth systems are irreversible — once you commit to a new token format, old tokens become a problem. Keeping the legacy path as a first-class citizen, with its own metrics and log fields, meant we could ship the optimization without a flag day. Old tokens kept working. New tokens got faster. The two paths converged over time on their own.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the deployment actually looked like
&lt;/h2&gt;

&lt;p&gt;Theory is one thing. Here's the Datadog dashboard from the bitmap deployment on Apr 27 at 17:30.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favcvw6wzy5cusyvnblku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favcvw6wzy5cusyvnblku.png" alt="Datadog dashboard showing request hits, error rate, p99 latency, and execution time breakdown across the bitmap deployment boundary" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The real win shows up in p99 latency: it drops from a spiky 5–14 ms pattern to a stable ~4–5 ms, eliminating GC-induced variance from string allocations.&lt;/p&gt;

&lt;p&gt;Execution time stayed in the 100–400 µs range, with a one-time spike during in-memory bitmap rebuild. Fallback usage naturally decayed as tokens rotated, validating seamless migration.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: Chapter 7 — token revocation. JWTs are stateless by design, which makes "log this user out right now" genuinely hard. We solved it with a Redis-backed revocation list, an in-process cache, and two startup races we had to fix the painful way.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>authorization</category>
      <category>performance</category>
      <category>go</category>
      <category>auth</category>
    </item>
    <item>
      <title>Part 5 — Multi-tenant auth and routing in Kubernetes</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:41:26 +0000</pubDate>
      <link>https://forem.com/akarshan/multi-tenant-auth-and-routing-in-kubernetes-o85</link>
      <guid>https://forem.com/akarshan/multi-tenant-auth-and-routing-in-kubernetes-o85</guid>
      <description>&lt;p&gt;In the first four chapters of this series I've talked about &lt;em&gt;what&lt;/em&gt; the Auth Gateway decides. This chapter is about &lt;em&gt;who&lt;/em&gt; it decides for.&lt;/p&gt;

&lt;p&gt;We run a multi-tenant platform. Every request, on every endpoint, belongs to one tenant. Get tenant resolution wrong and you don't have a security incident — you have a &lt;em&gt;cross-tenant data leak&lt;/em&gt; incident, which is a category of bad you don't recover from.&lt;/p&gt;

&lt;p&gt;This chapter is the boring, careful, paranoid story of how NGINX and the Auth Service cooperate to never let a request through without a clear tenant identity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two questions
&lt;/h2&gt;

&lt;p&gt;Every multi-tenant request raises two questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Which tenant is this for?&lt;/strong&gt; (resolution)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where does the request go for that tenant?&lt;/strong&gt; (routing)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We answer #1 at the NGINX layer, before auth. We answer #2 partly at NGINX (path-based routing) and partly inside the upstream service (tenant-scoped queries). The Auth Service sits between them: it makes sure the &lt;em&gt;token's&lt;/em&gt; tenant matches the &lt;em&gt;request's&lt;/em&gt; tenant before either service sees the request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resolution: two valid inputs, one explicit failure mode
&lt;/h2&gt;

&lt;p&gt;We accept two ways to identify a tenant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Tenant-ID&lt;/code&gt; header.&lt;/strong&gt; Explicit. Used by service-to-service calls and SDKs that know who they're for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Host header (mapped via &lt;code&gt;X-Tenant-Host&lt;/code&gt;).&lt;/strong&gt; Implicit. Used by per-tenant DNS like &lt;code&gt;tenant1.example.com&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We do &lt;em&gt;not&lt;/em&gt; accept a third way: a default tenant. There is no fallback. If both inputs are missing or unknown, NGINX returns 400 &lt;em&gt;before&lt;/em&gt; the Auth Service is even called.&lt;/p&gt;

&lt;p&gt;Why so strict? Because a default tenant is the most expensive bug you can ship. Every "wait why is data showing up in tenant X?" post-mortem starts the same way: somebody added a fallback "for convenience" and somebody else's request hit it without a tenant header.&lt;/p&gt;

&lt;p&gt;We removed our default tenant on day 90 and have never looked back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7d2g7jz5npr8b94rhri.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7d2g7jz5npr8b94rhri.png" alt=" " width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Per-tenant SNI server blocks
&lt;/h2&gt;

&lt;p&gt;For tenants with their own DNS, NGINX uses &lt;em&gt;server blocks&lt;/em&gt; to short-circuit resolution. The Helm chart templates one server per tenant from &lt;code&gt;global.tenants&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="err"&gt;{{&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &lt;span class="s"&gt;range&lt;/span&gt; &lt;span class="s"&gt;.Values.global.tenants&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;
&lt;span class="s"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; $&lt;span class="kn"&gt;.Values.containers.containerPort&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.tenant_dns&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.tenant_id&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$tenant_namespace&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.tenant_namespace&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/auth.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/locations.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/custom_error_locations.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;{{&lt;/span&gt;&lt;span class="kn"&gt;-&lt;/span&gt; &lt;span class="s"&gt;end&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A request to &lt;code&gt;tenant1.example.com&lt;/code&gt; matches &lt;code&gt;server_name tenant1.example.com&lt;/code&gt;, lands in this block, and &lt;code&gt;$tenant_id&lt;/code&gt; is &lt;em&gt;already&lt;/em&gt; set before any other directive runs. There is no header parsing, no map lookup, no opportunity for ambiguity. Tenant identity is pinned at SNI time.&lt;/p&gt;

&lt;p&gt;This is also nice for TLS: the per-tenant ingress can attach per-tenant certificates if you want them, and the SNI selection happens before any HTTP processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The default server block: header-based fallback
&lt;/h2&gt;

&lt;p&gt;Not every tenant has its own DNS. Many service-to-service calls hit a shared in-cluster ingress with &lt;code&gt;X-Tenant-ID&lt;/code&gt; set explicitly. For those, the default server block handles resolution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.Values.containers.containerPort&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                            &lt;span class="c1"&gt;# match anything not matched above&lt;/span&gt;

  &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$http_x_tenant_id&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id&lt;/span&gt; &lt;span class="nv"&gt;$http_x_tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;# priority 1&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tenant_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"")&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id_from_host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;# priority 2: map of X-Tenant-Host&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tenant_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"")&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="s"&gt;"Tenant&lt;/span&gt; &lt;span class="s"&gt;not&lt;/span&gt; &lt;span class="s"&gt;specified"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;# priority 3: hard fail&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/auth.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/locations.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/custom_error_locations.conf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;$tenant_id_from_host&lt;/code&gt; is a &lt;code&gt;map&lt;/code&gt; populated from the same &lt;code&gt;global.tenants&lt;/code&gt; list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;map&lt;/span&gt; &lt;span class="nv"&gt;$http_x_tenant_host&lt;/span&gt; &lt;span class="nv"&gt;$tenant_id_from_host&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="err"&gt;{{&lt;/span&gt;&lt;span class="kn"&gt;-&lt;/span&gt; &lt;span class="s"&gt;range&lt;/span&gt; &lt;span class="s"&gt;.Values.global.tenants&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;
  &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.tenant_dns&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;.tenant_id&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="err"&gt;{{&lt;/span&gt;&lt;span class="kn"&gt;-&lt;/span&gt; &lt;span class="s"&gt;end&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few subtleties worth highlighting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The order matters. Header beats host. We picked header priority because programmatic clients should be explicit.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;$tenant_id_from_host&lt;/code&gt; defaults to empty string if the host isn't in the map. We then 400 — same as if the header was missing entirely.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;if&lt;/code&gt; directives in NGINX are deeply weird. We confined them to this block and resisted the temptation to put &lt;code&gt;if&lt;/code&gt;s anywhere else in the config.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tenant binding inside the JWT
&lt;/h2&gt;

&lt;p&gt;Once NGINX has set &lt;code&gt;$tenant_id&lt;/code&gt;, it forwards it as &lt;code&gt;X-Tenant-ID&lt;/code&gt; to the Auth Service. But the &lt;em&gt;token&lt;/em&gt; also carries a tenant claim. The Auth Service must check they match:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;userToken&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantID&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantID&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReasonTenantMismatch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"tenant mismatch"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the line that saves you when a malicious actor copies a token from tenant A and replays it against tenant B's hostname. The token signature is valid. The token isn't expired. The token isn't revoked. But the &lt;em&gt;tenant&lt;/em&gt; in the token is &lt;code&gt;tenantA&lt;/code&gt; and the request is for &lt;code&gt;tenantB&lt;/code&gt;. We 401.&lt;/p&gt;

&lt;p&gt;Three things make this work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The token is bound to a tenant at issuance.&lt;/strong&gt; Our token issuer puts &lt;code&gt;tid: "tenantA"&lt;/code&gt; in the JWT claims when it mints the token. We sign with the per-tenant RSA key (Chapter 3), so a token from tenant A &lt;em&gt;can't&lt;/em&gt; be re-signed for tenant B without the private key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The gateway picks the verification key by &lt;code&gt;X-Tenant-ID&lt;/code&gt;.&lt;/strong&gt; If the request says it's for tenant B, we verify the token's signature with tenant B's public key. A tenant A token signed with tenant A's key fails signature validation, not tenant binding — but either way it's denied.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The tenant claim is &lt;em&gt;also&lt;/em&gt; checked.&lt;/strong&gt; Even if the keys were the same, the explicit &lt;code&gt;userToken.TenantID != log.TenantID&lt;/code&gt; check would catch reuse.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Belt and suspenders. We've never regretted having both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sequence: tenant flow end-to-end
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81h3msmwoogdtm0a9u7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81h3msmwoogdtm0a9u7g.png" alt=" " width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing: MT vs ST upstreams
&lt;/h2&gt;

&lt;p&gt;Once we know the tenant, we have to route the request. We have two upstream models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MT (multi-tenant)&lt;/strong&gt; services. One deployment, serves all tenants. Tenant comes in as &lt;code&gt;X-Tenant-ID&lt;/code&gt;, the service queries data with &lt;code&gt;WHERE tenant_id = ?&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ST (single-tenant)&lt;/strong&gt; services. One deployment &lt;em&gt;per tenant&lt;/em&gt;, in the tenant's own Kubernetes namespace. The service doesn't even need to know about other tenants — it can't see them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is purely a per-service architectural choice. Some products are happy with MT; some have stricter isolation requirements (or run heavy per-tenant data) and want ST.&lt;/p&gt;

&lt;p&gt;The location loop in &lt;code&gt;locations.conf&lt;/code&gt; handles both with one branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;eq&lt;/span&gt; &lt;span class="nv"&gt;$type&lt;/span&gt; &lt;span class="s"&gt;"ST"&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$serviceDict&lt;/span&gt;&lt;span class="kn"&gt;.SERVICE_HOST&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;$tenant_namespace&lt;/span&gt;&lt;span class="s"&gt;.svc.cluster.local&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;else&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$serviceDict&lt;/span&gt;&lt;span class="kn"&gt;.SERVICE_HOST&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="kn"&gt;end&lt;/span&gt; &lt;span class="err"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unrolled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MT:&lt;/strong&gt; &lt;code&gt;proxy_pass http://user-service&lt;/code&gt; — the bare service name. CoreDNS resolves it to the service's ClusterIP in whatever namespace the gateway lives in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ST:&lt;/strong&gt; &lt;code&gt;proxy_pass http://api-service.tenant1-ns.svc.cluster.local&lt;/code&gt; — the FQDN includes the tenant namespace. Each tenant has its own copy of the service in their own namespace.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzq8hrbmm5qnhv541cgx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzq8hrbmm5qnhv541cgx.png" alt=" " width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few subtleties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For ST, the tenant namespace is &lt;em&gt;part of the DNS name&lt;/em&gt;. NGINX's resolver kicks in at request time, not at config-load time. Adding a new tenant means deploying its services in &lt;code&gt;tenantN-ns&lt;/code&gt;, then adding it to &lt;code&gt;global.tenants&lt;/code&gt;. NGINX picks it up on the next config reload.&lt;/li&gt;
&lt;li&gt;For MT, all tenants hit the same upstream IP. The upstream service is responsible for tenant scoping. We trust it because we forward &lt;code&gt;X-Tenant-ID&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; the upstream service's auth library re-checks the header against the token's tenant. (Yes, double-checking. After the first cross-tenant near-miss, we added it.)&lt;/li&gt;
&lt;li&gt;ST is more expensive operationally — N deployments of every service — but radically simpler to reason about. Two services, two answers; pick what your compliance team can defend.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Headers we propagate to upstream
&lt;/h2&gt;

&lt;p&gt;After auth passes, NGINX sends a defined set of headers to the upstream. From &lt;code&gt;locations.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-ID&lt;/span&gt;       &lt;span class="nv"&gt;$identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-Type&lt;/span&gt;     &lt;span class="nv"&gt;$identity_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-Name&lt;/span&gt;     &lt;span class="nv"&gt;$identity_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Session-ID&lt;/span&gt;        &lt;span class="nv"&gt;$session_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;         &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-Namespace&lt;/span&gt;  &lt;span class="nv"&gt;$tenant_namespace&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt;        &lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The upstream contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Identity-ID&lt;/code&gt;&lt;/strong&gt; is the principal. Treat it as the user's primary key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Identity-Type&lt;/code&gt;&lt;/strong&gt; is &lt;code&gt;USER&lt;/code&gt; or &lt;code&gt;SERVICE_ACCOUNT&lt;/code&gt;. Some endpoints reject service accounts, some require them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Tenant-ID&lt;/code&gt;&lt;/strong&gt; is the tenant. &lt;em&gt;Always&lt;/em&gt; scope queries by it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Tenant-Namespace&lt;/code&gt;&lt;/strong&gt; is the Kubernetes namespace, useful for diagnostics and per-tenant Kafka topic naming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Session-ID&lt;/code&gt;&lt;/strong&gt; is an opaque session correlation ID. Useful for logging, never for auth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;X-Request-ID&lt;/code&gt;&lt;/strong&gt; is the trace correlation ID. Forward it to your downstream calls so the whole graph stitches together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We do &lt;em&gt;not&lt;/em&gt; forward &lt;code&gt;Authorization&lt;/code&gt;. The upstream service has no business looking at the JWT. If it needs to know who's calling, it uses &lt;code&gt;X-Identity-ID&lt;/code&gt;. If it needs to make a downstream call, it gets a fresh service-account token — it does not replay the user's token.&lt;/p&gt;

&lt;p&gt;Stripping &lt;code&gt;Authorization&lt;/code&gt; was one of those changes that everyone agrees is a great idea in principle and fights tooth-and-nail when their service breaks during the rollout. Worth the fight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skipping tenant for a few endpoints
&lt;/h2&gt;

&lt;p&gt;A handful of endpoints genuinely don't have a tenant: NGINX &lt;code&gt;/healthz&lt;/code&gt;, public OAuth callbacks, JWKS endpoints, version probes. For these, tenant resolution must &lt;em&gt;not&lt;/em&gt; run.&lt;/p&gt;

&lt;p&gt;In NGINX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/healthz&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;access_log&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;204&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This location is matched &lt;em&gt;before&lt;/em&gt; the tenant-resolving &lt;code&gt;if&lt;/code&gt; block in the default server, because NGINX processes more-specific locations first. &lt;code&gt;/healthz&lt;/code&gt; returns 204 without ever evaluating &lt;code&gt;$tenant_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Inside the Auth Service, the equivalent pattern shows up: the trie has rows with &lt;code&gt;endpoint_type=OPEN&lt;/code&gt; and &lt;em&gt;no&lt;/em&gt; tenant requirement. Even if NGINX did pass through, the Auth Service would allow without checking the tenant. Belt and suspenders again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ingress regex: another layer of opt-in
&lt;/h2&gt;

&lt;p&gt;Our cluster's Ingress controller routes some paths through the Auth Gateway and some paths &lt;em&gt;around&lt;/em&gt; it. The chart's multi-tenant Ingress uses a negative-lookahead regex to express "send everything to NGINX &lt;em&gt;except&lt;/em&gt; these specific exempt paths":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/(?!{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$exemptedPattern&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}).*"&lt;/span&gt;
  &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ImplementationSpecific&lt;/span&gt;
  &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;$shortName&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;$exemptedPattern&lt;/code&gt; is a long alternation built from &lt;code&gt;values.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;exemptedPaths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;api/&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;login/&lt;/span&gt;
  &lt;span class="na"&gt;exemptedPrefixes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ui&lt;/span&gt;
  &lt;span class="na"&gt;exemptedExtensions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;js&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;css&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ico&lt;/span&gt;
    &lt;span class="c1"&gt;# ... static assets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anything matching the regex bypasses the gateway and goes to a legacy ingress. This is how we rolled the gateway out &lt;em&gt;one service at a time&lt;/em&gt; — and how we keep it manageable today as services migrate at different speeds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd warn future-us about
&lt;/h2&gt;

&lt;p&gt;A few real lessons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default tenants are forever.&lt;/strong&gt; If you ship one, every subsequent design decision will assume it exists, and removing it later is a multi-quarter project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant-aware logging is non-negotiable.&lt;/strong&gt; Every log line must carry &lt;code&gt;tenant_id&lt;/code&gt;. We don't grep by user — we grep by tenant first, then narrow. Chapter 9 has the log format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the tenant model boring.&lt;/strong&gt; "What is a tenant?" should have a 1-sentence answer. The moment "tenant" starts meaning different things in different services, your isolation guarantees evaporate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MT and ST are different operating models, not different security models.&lt;/strong&gt; The same auth contract should hold. If your ST services can be looser because "they're isolated anyway," you have a problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never derive tenant from the user.&lt;/strong&gt; "User belongs to tenant X, so I'll use tenant X" sounds reasonable until you have users in multiple tenants. The tenant comes from the &lt;em&gt;request&lt;/em&gt;, not from the &lt;em&gt;user&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Chapter 6 stays on tenant boundaries but zooms in on the authorization side: roles, access levels, the role → access-level → endpoint mapping, and the bitmap fast path that replaced our original string-set matching. We'll see why a JWT &lt;em&gt;should not&lt;/em&gt; be the source of truth for a user's full permission set, and how to encode permissions densely enough that the gateway can decide in O(1) bitwise.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>multitenancy</category>
      <category>nginx</category>
      <category>auth</category>
    </item>
    <item>
      <title>Part 4 — Endpoint classification: OPEN, AUTHENTICATED, ACCESS_CONTROLLED</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:41:07 +0000</pubDate>
      <link>https://forem.com/akarshan/endpoint-classification-open-authenticated-accesscontrolled-561</link>
      <guid>https://forem.com/akarshan/endpoint-classification-open-authenticated-accesscontrolled-561</guid>
      <description>&lt;p&gt;In Chapter 3 the controller branched on something called the "endpoint type":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;endpointType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;perms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"OPEN"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;            &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"AUTHENTICATED"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"ACCESS_CONTROLLED"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That branch is the most important conditional in the entire gateway. It decides whether a request even gets a token check, and whether to run authorization. This chapter is about how that decision is &lt;em&gt;data&lt;/em&gt;, not code, and the trie that powers it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three kinds of endpoint
&lt;/h2&gt;

&lt;p&gt;Every endpoint in our platform falls into one of three buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OPEN&lt;/strong&gt; — no auth required at all. Health checks, public OAuth callbacks, JWKS, version, docs. The request is allowed without a token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AUTHENTICATED&lt;/strong&gt; — token required, no specific permission. "Get my own profile," logout, list-my-stuff endpoints. Anyone with a valid token can call it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACCESS_CONTROLLED&lt;/strong&gt; — token required &lt;em&gt;and&lt;/em&gt; a specific permission. Admin operations, deletes, anything that crosses a user boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Auth Service runs different pipelines for each:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsyx5cxwqlu0wvj33e67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsyx5cxwqlu0wvj33e67.png" alt=" " width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The brilliance — and we say this honestly, because we did &lt;em&gt;not&lt;/em&gt; design it this way the first time — is that an endpoint's classification is a column in a database row, not a hardcoded route. Adding a new admin route means inserting a row, not deploying the gateway. We rebuild the in-memory data structure on demand.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data structure: a trie
&lt;/h2&gt;

&lt;p&gt;When NGINX hits &lt;code&gt;/auth&lt;/code&gt;, it forwards &lt;code&gt;X-Original-URI: /user-management/v1/users/abc123&lt;/code&gt; and &lt;code&gt;X-Original-Method: GET&lt;/code&gt;. We need to turn that into an &lt;em&gt;endpoint metadata record&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The naïve approach is a big map with &lt;code&gt;(method, full_path)&lt;/code&gt; keys. That works until you have wildcards: &lt;code&gt;/users/{id}&lt;/code&gt; should match both &lt;code&gt;/users/abc&lt;/code&gt; and &lt;code&gt;/users/xyz&lt;/code&gt;. Once you have wildcards you want a trie.&lt;/p&gt;

&lt;p&gt;Our trie node looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;TrieNode&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;children&lt;/span&gt;   &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TrieNode&lt;/span&gt;  &lt;span class="c"&gt;// exact-match segments&lt;/span&gt;
    &lt;span class="n"&gt;wildcard&lt;/span&gt;   &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TrieNode&lt;/span&gt;             &lt;span class="c"&gt;// catch-all child for {id}-style segments&lt;/span&gt;
    &lt;span class="n"&gt;Permissions&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;][]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="c"&gt;// method -&amp;gt; required permissions&lt;/span&gt;
    &lt;span class="n"&gt;EndpointType&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;               &lt;span class="c"&gt;// OPEN | AUTHENTICATED | ACCESS_CONTROLLED&lt;/span&gt;
    &lt;span class="n"&gt;BitmapMask&lt;/span&gt;   &lt;span class="kt"&gt;uint64&lt;/span&gt;               &lt;span class="c"&gt;// pre-computed for bitmap fast-path (Chapter 6)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A trie key is a slash-segmented path. &lt;code&gt;/users/{id}/roles&lt;/code&gt; becomes the path &lt;code&gt;["users", "{id}", "roles"]&lt;/code&gt;. Walking the trie is one segment at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Trie&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TrieNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seg&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Permissions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;O(depth)&lt;/code&gt; worst case, where depth is the number of segments. In practice a typical endpoint is 3-5 segments. We're talking nanoseconds.&lt;/p&gt;

&lt;p&gt;A toy view of what one slug's trie looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdiltljlcz1nvbxdib3hq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdiltljlcz1nvbxdib3hq.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice &lt;code&gt;me&lt;/code&gt; and &lt;code&gt;{id}&lt;/code&gt; are both children of &lt;code&gt;users&lt;/code&gt;. &lt;code&gt;users/me&lt;/code&gt; resolves first (exact match) and gets &lt;code&gt;AUTHENTICATED&lt;/code&gt;. &lt;code&gt;users/abc123&lt;/code&gt; falls through to the &lt;code&gt;{id}&lt;/code&gt; branch and gets &lt;code&gt;ACCESS_CONTROLLED + user:read&lt;/code&gt;. Order matters: exact wins over wildcard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Service slug: scoping the trie
&lt;/h2&gt;

&lt;p&gt;We don't have one giant trie. We have &lt;em&gt;one trie per service slug&lt;/em&gt;. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different services own different paths.&lt;/li&gt;
&lt;li&gt;A single global trie collides on ambiguous paths (multiple services exposing &lt;code&gt;/v1/users&lt;/code&gt; for different reasons).&lt;/li&gt;
&lt;li&gt;Cache locality and refresh granularity are better when each service's routes are isolated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a long time the slug came from "the first segment of the URI." That was fine until services nested each other (&lt;code&gt;/api/v2/user-management/users&lt;/code&gt;). So we added two optional headers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;X-Service-Slug&lt;/code&gt; — explicit slug.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Request-Path&lt;/code&gt; — the path &lt;em&gt;inside&lt;/em&gt; that slug, with the prefix already stripped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The route resolver uses them when present:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RouteResolver&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ResolveEndpointWithMetrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;AuthDecisionLog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trieKey&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;perms&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trieExists&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-Service-Slug"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-Request-Path"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Legacy: split URI on first segment&lt;/span&gt;
        &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitFirstSegment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;trie&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;globalTrieRegistry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;trie&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;trie&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Permissions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two return flags, not one: &lt;code&gt;trieExists&lt;/code&gt; distinguishes "the service slug isn't registered" from "the service slug is registered but this path doesn't match." The first is a server problem (deploy mismatch). The second is a client problem (404). Different decision reasons, different alerts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading the trie from Postgres
&lt;/h2&gt;

&lt;p&gt;The source of truth is two tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Each row is one (service, method, pattern) combination&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;            &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;service_slug&lt;/span&gt;  &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;method&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pattern&lt;/span&gt;       &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;endpoint_type&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  &lt;span class="c1"&gt;-- OPEN | AUTHENTICATED | ACCESS_CONTROLLED&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Permissions an endpoint requires (only for ACCESS_CONTROLLED)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;endpoint_policy&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;endpoint_id&lt;/span&gt;      &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;general_policy_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;general_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- The actual policy table — name + bit_index for the bitmap path (Chapter 6)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;general_policy&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;        &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;      &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bit_index&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="c1"&gt;-- nullable; assigned for bitmap-eligible permissions&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At startup, &lt;code&gt;LoadTrieAndRegistry&lt;/code&gt; runs one query per active service slug, joins the three tables, and builds a &lt;code&gt;Trie&lt;/code&gt; per slug:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;LoadTrieAndRegistry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Trie&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;policybitmap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Snapshot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;`
        SELECT e.service_slug, e.method, e.pattern, e.endpoint_type,
               coalesce(array_agg(gp.name) FILTER (WHERE gp.id IS NOT NULL), '{}'),
               coalesce(array_agg(gp.bit_index) FILTER (WHERE gp.bit_index IS NOT NULL), '{}')
          FROM endpoint e
          LEFT JOIN endpoint_policy ep ON ep.endpoint_id = e.id
          LEFT JOIN general_policy  gp ON gp.id = ep.general_policy_id
         GROUP BY e.id
    `&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tries&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Trie&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;snap&lt;/span&gt;  &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;policybitmap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSnapshot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;etype&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
            &lt;span class="n"&gt;permNames&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
            &lt;span class="n"&gt;bitIdxs&lt;/span&gt;   &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;etype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;permNames&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;pq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bitIdxs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;tries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTrie&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EndpointType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;etype&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Permissions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;][]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;permNames&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BitmapMask&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;computeMask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bitIdxs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c"&gt;// Chapter 6&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single SQL query, one pass to build N tries. We measure load time and emit it as a startup log: &lt;code&gt;trie_load_duration_ms&lt;/code&gt;. On a healthy database, hundreds of tries with thousands of routes load in well under a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refreshing without a restart
&lt;/h2&gt;

&lt;p&gt;Endpoint metadata changes — we onboard a new service, add a new permission, deprecate a route. We don't want to roll the gateway every time.&lt;/p&gt;

&lt;p&gt;There are two refresh mechanisms:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Periodic.&lt;/strong&gt; Every &lt;code&gt;TRIE_REFRESH_INTERVAL_SECS&lt;/code&gt; (default: 1 hour) we re-run the loader. This is a safety net. If the live channel ever misses an event, periodic catches it within an hour.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live.&lt;/strong&gt; A Redis Pub/Sub channel called &lt;code&gt;auth:trie:refresh&lt;/code&gt;. When admin tooling changes endpoint metadata, it &lt;code&gt;PUBLISH&lt;/code&gt;es to that channel. Every Auth Service pod is subscribed, and refreshes within milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuw89ibdgwobgeyaiixlq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuw89ibdgwobgeyaiixlq.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Pub/Sub message itself carries no payload. It's purely a &lt;em&gt;kick&lt;/em&gt;. Each pod queries the database itself. Two reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don't have to keep the message in sync with the schema. New columns, no message change.&lt;/li&gt;
&lt;li&gt;A pod that just booted does the same load as a pod that received a refresh kick. One code path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The downside: every refresh kick = one query per pod. If you have 100 pods and someone bulk-edits 1000 endpoints with 1000 publishes, that's 100,000 queries. We added a debounce (coalesce events within a 200 ms window). Onboarding a service still hammers the DB once, briefly, but it doesn't spiral.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching the lookup
&lt;/h2&gt;

&lt;p&gt;Even with a fast trie, re-walking the same path on every request is wasteful. We layer a TinyLFU cache (W-TinyLFU, via a Go port) on top of the resolver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RouteCache&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;inner&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tinylfu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RouteResult&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RouteCache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RouteResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x00&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x00&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is &lt;code&gt;service_slug + NUL + method + NUL + path&lt;/code&gt;. The NUL byte is a delimiter — paths can't contain it, so we can't accidentally collide a slug with the start of another slug.&lt;/p&gt;

&lt;p&gt;The cache is bounded by entry count, not by RAM. Default: 10,000 routes with 100,000 frequency counters. TinyLFU keeps the &lt;em&gt;frequently used&lt;/em&gt; routes hot and evicts cold ones — better than LRU when traffic has a long tail of rarely-hit paths.&lt;/p&gt;

&lt;p&gt;Cache invalidation: on any trie reload (periodic or kick), the cache is &lt;em&gt;fully cleared&lt;/em&gt;. We considered partial invalidation. We chose not to — the cache fills back up in milliseconds because the same handful of paths drives 99% of the traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this beats per-route decorators
&lt;/h2&gt;

&lt;p&gt;In a Python or Node service you might write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users/&amp;lt;id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nd"&gt;@requires_permission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user:read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three problems with that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Service-by-service drift.&lt;/strong&gt; Each service maintains its own decorators. Different services use different permission strings, different exception types, different log shapes. The contract is informal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fragile to refactor.&lt;/strong&gt; Move the route, lose the decorator. Now anyone can call it. We've seen this happen at zero, two, and twelve months in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit is impossible at the service level.&lt;/strong&gt; "What permission protects this URL?" requires reading every service's source.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By making endpoint metadata a &lt;em&gt;database row&lt;/em&gt;, we get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One contract, in one schema.&lt;/li&gt;
&lt;li&gt;Decoupled from code — no rebuild to add a route.&lt;/li&gt;
&lt;li&gt;Auditable: a single &lt;code&gt;SELECT&lt;/code&gt; answers "what does this URL require?"&lt;/li&gt;
&lt;li&gt;Reusable: the same metadata drives the gateway &lt;em&gt;and&lt;/em&gt; the admin UI that lets product managers tweak it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cost is that you can't read the auth requirements next to the handler in the upstream service's code. We mitigate that with code generation: a CI job dumps &lt;code&gt;endpoint&lt;/code&gt; rows for each service into a &lt;code&gt;routes.yaml&lt;/code&gt; checked into the service's repo for reference. The DB stays the source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Readiness depends on the trie
&lt;/h2&gt;

&lt;p&gt;There's a subtle interaction with Kubernetes probes worth calling out. Our &lt;code&gt;/readyz&lt;/code&gt; returns 503 if the trie isn't loaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;CheckTrieReadiness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tries&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Trie&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusServiceUnavailable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"trie not loaded"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A pod that boots before Postgres is reachable will fail readiness, which keeps it out of the Service load balancer until the trie is loaded. A pod that &lt;em&gt;was&lt;/em&gt; serving traffic and then loses access to Postgres keeps its existing trie in memory and stays ready — refreshes fail loudly via Slack, but live traffic isn't disrupted.&lt;/p&gt;

&lt;p&gt;This split — "load fails block readiness, refresh failures don't" — is on purpose. A booting pod with no trie can't make decisions; pull it out. A serving pod with a stale trie can still make decisions correctly for endpoints that haven't changed; keep it in service while we fix Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-patterns we avoided
&lt;/h2&gt;

&lt;p&gt;Worth listing what we considered and rejected, because they're tempting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storing the endpoint type inside the JWT.&lt;/strong&gt; Tempting because then you don't need a trie. Wrong because a token outlives configuration changes — we'd cache stale auth requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A single hard-coded list of public paths.&lt;/strong&gt; A previous iteration had a YAML file shipped with the gateway. Updating it required a deploy. The trie + DB replaced it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-tenant route metadata.&lt;/strong&gt; We talked about it. We rejected it: the same service exposes the same routes for every tenant. Tenant-specific differences belong in the access-level model (Chapter 6), not the route model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letting the upstream service register its own routes via API at startup.&lt;/strong&gt; Looks elegant. Falls apart in chaos: a buggy service can register away its own auth.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Chapter 5 takes the same metadata-as-data philosophy and applies it to multi-tenancy. The trie tells us &lt;em&gt;what&lt;/em&gt; an endpoint requires; tenant resolution tells us &lt;em&gt;who&lt;/em&gt; it's for. We'll see how NGINX server blocks, headers, and tenant maps cooperate to never let a request through without a clear tenant identity.&lt;/p&gt;

</description>
      <category>go</category>
      <category>architecture</category>
      <category>auth</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Part 3 — Inside the Auth Service: From Token Validator to Policy Decision Point</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:40:48 +0000</pubDate>
      <link>https://forem.com/akarshan/inside-the-auth-service-from-token-validator-to-policy-decision-point-3kj8</link>
      <guid>https://forem.com/akarshan/inside-the-auth-service-from-token-validator-to-policy-decision-point-3kj8</guid>
      <description>&lt;p&gt;Most auth services start simple — verify the token, return 200 or 401. Then requirements accumulate. Tenant isolation. Service accounts. Token revocation. Access levels per endpoint. And suddenly what was a lightweight validator is carrying a lot of weight, without a clear structure to hold it.&lt;/p&gt;

&lt;p&gt;This post is about how we structured ours — the ideas that shaped it, and the ones we got wrong before landing here.&lt;/p&gt;




&lt;h2&gt;
  
  
  One job, lots of supporting infrastructure
&lt;/h2&gt;

&lt;p&gt;The Auth Service does exactly one thing from the outside: receive a subrequest from NGINX, inspect the headers, and return a decision. Under a millisecond, every time.&lt;/p&gt;

&lt;p&gt;But a single HTTP handler that does that reliably at scale has a lot underneath it — caching, revocation checks, routing logic, identity propagation. The structural challenge is keeping the handler &lt;em&gt;small&lt;/em&gt; while the infrastructure grows. We landed on a controller that reads like a flowchart:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract the request metadata (URI, method, tenant).&lt;/li&gt;
&lt;li&gt;Resolve the endpoint to find out what kind of auth it needs.&lt;/li&gt;
&lt;li&gt;Based on that: allow it openly, run authentication only, or run authentication &lt;em&gt;and&lt;/em&gt; authorization.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole thing. Everything else is a service the controller delegates to.&lt;/p&gt;




&lt;h2&gt;
  
  
  The insight that changed how we think about routing: endpoint classification is data, not code
&lt;/h2&gt;

&lt;p&gt;Early on, we made auth decisions in code. A route was open because someone wrote &lt;code&gt;if path == "/health" { return 200 }&lt;/code&gt;. Access control lived in conditionals scattered across handlers.&lt;/p&gt;

&lt;p&gt;This breaks the moment your product team adds a new endpoint, or you need to temporarily open a route for a partner integration, or you realize a route that was open should have been authenticated all along.&lt;/p&gt;

&lt;p&gt;We flipped it: every endpoint in the system has a classification stored in the database — &lt;code&gt;OPEN&lt;/code&gt;, &lt;code&gt;AUTHENTICATED&lt;/code&gt;, or &lt;code&gt;ACCESS_CONTROLLED&lt;/code&gt; — along with a permission list if it's access-controlled. The auth service resolves the incoming request to an endpoint record and reads that classification. The decision logic then becomes a simple switch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OPEN&lt;/strong&gt;: allow, log it, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AUTHENTICATED&lt;/strong&gt;: run token validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACCESS_CONTROLLED&lt;/strong&gt;: run token validation, then check permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The consequence is that we never recompile or redeploy the Auth Service to change how a route is protected. That's a database update. It also means non-engineers can reason about the access model without reading code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexsssx65943hzt6aeqzs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexsssx65943hzt6aeqzs.png" alt=" " width="800" height="902"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Naming every failure: the decision-reason contract
&lt;/h2&gt;

&lt;p&gt;The second structural idea that shaped everything else: every outcome has an explicit name.&lt;/p&gt;

&lt;p&gt;We maintain an enumerated list of decision reasons — constants like &lt;code&gt;MISSING_TOKEN&lt;/code&gt;, &lt;code&gt;TENANT_MISMATCH&lt;/code&gt;, &lt;code&gt;TOKEN_REVOKED&lt;/code&gt;, &lt;code&gt;SA_VERSION_MISMATCH&lt;/code&gt;, &lt;code&gt;OPEN_ENDPOINT&lt;/code&gt;, &lt;code&gt;ACCESS_LEVEL_MATCH&lt;/code&gt;. Every code path in the service must set one before returning. There's no exit that doesn't produce a named reason.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ReasonOpenEndpoint&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OPEN_ENDPOINT"&lt;/span&gt;
    &lt;span class="n"&gt;ReasonMissingToken&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"MISSING_TOKEN"&lt;/span&gt;
    &lt;span class="n"&gt;ReasonTokenRevoked&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"TOKEN_REVOKED"&lt;/span&gt;
    &lt;span class="n"&gt;ReasonTenantMismatch&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"TENANT_MISMATCH"&lt;/span&gt;
    &lt;span class="n"&gt;ReasonSAVersionMismatch&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"SA_VERSION_MISMATCH"&lt;/span&gt;
    &lt;span class="n"&gt;ReasonTokenTypeMismatch&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"TOKEN_TYPE_MISMATCH"&lt;/span&gt;
    &lt;span class="c"&gt;// ... and so on&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sounds like a minor logging detail. It isn't.&lt;/p&gt;

&lt;p&gt;When a token fails, &lt;em&gt;why&lt;/em&gt; it fails tells a completely different story depending on the reason. &lt;code&gt;TOKEN_REVOKED&lt;/code&gt; means the user logged out or was disabled. &lt;code&gt;SA_VERSION_MISMATCH&lt;/code&gt; means a service account was rotated and the calling service hasn't caught up. &lt;code&gt;TOKEN_TYPE_MISMATCH&lt;/code&gt; means something is trying to authenticate with a refresh token where it should use an access token — usually a buggy SDK, occasionally something worth investigating.&lt;/p&gt;

&lt;p&gt;If all of these collapsed into a generic &lt;code&gt;401 Unauthorized&lt;/code&gt;, you'd lose all of that signal. Dashboards would be useless. On-call would be guessing.&lt;/p&gt;

&lt;p&gt;The list itself is a contract with the log pipeline. New reasons go through code review. Old reasons can't be deleted without checking dashboards and alerts first. It's one of the few places in the codebase where "this is more rigid than it needs to be" is actually correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  One log line per request — and why that matters more than it sounds
&lt;/h2&gt;

&lt;p&gt;Our first approach was to emit log lines at each stage of the pipeline — one when we resolved the route, one when we validated the token, one when we made the authorization decision. We could stitch them together by request ID.&lt;/p&gt;

&lt;p&gt;We abandoned this. The stitching was always slightly wrong. Correlation IDs got dropped. Fields you needed were in a different log line than the one you found first. Debugging a production incident meant reconstructing a timeline from fragments.&lt;/p&gt;

&lt;p&gt;Now there's one structured log record per request. It's built up incrementally — every handler in the pipeline writes into the same struct. By the time the response goes out, the record has every field: URI, method, tenant, identity, cache hit status, decision reason, outcome. It emits once, at the end.&lt;/p&gt;

&lt;p&gt;The operational improvement was immediate. Grepping for a user's identity ID gives you a complete picture of every request they made — what was allowed, what was denied, and exactly why. No joining, no reconstruction.&lt;/p&gt;

&lt;p&gt;If you're designing an auth service, this is the first structural decision we'd recommend getting right. Everything else can be refactored. The logging model tends to calcify early.&lt;/p&gt;




&lt;h2&gt;
  
  
  How we handle JWT verification at scale
&lt;/h2&gt;

&lt;p&gt;Validating a JWT sounds cheap. For HS256 with a shared secret, it mostly is. For RS256 with asymmetric keys — which is what we use for user-facing tokens — the RSA verification step sits in the hundreds of microseconds. At meaningful request volume, that becomes a real CPU cost.&lt;/p&gt;

&lt;p&gt;Our solution is a cache in front of the decode step. The cache key is a hash of the raw token string (not the string itself — the hash is 8 bytes versus potentially hundreds, which adds up at scale). The TTL matches the token's expiry. When a token comes in that we've already verified recently, we skip the RSA verification entirely.&lt;/p&gt;

&lt;p&gt;A few things we were careful &lt;em&gt;not&lt;/em&gt; to cache:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revocation state.&lt;/strong&gt; Whether a token has been revoked can change at any moment, independent of the token's validity. We cache the decode result — the claims, the identity — but we always check revocation live. These are different questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The auth decision itself.&lt;/strong&gt; The decision depends on the endpoint, the tenant, and the required access level, none of which the token cache knows about. Caching decisions would mean a user who got their access level changed mid-session would still see stale decisions until cache expiry. Unacceptable.&lt;/p&gt;

&lt;p&gt;The principle here generalizes: cache the &lt;em&gt;facts&lt;/em&gt; (what the token says), not the &lt;em&gt;decisions&lt;/em&gt; (what we're going to do about it).&lt;/p&gt;




&lt;h2&gt;
  
  
  The boundary the Auth Service deliberately doesn't cross
&lt;/h2&gt;

&lt;p&gt;The clearest sign a service is well-designed is what it refuses to do.&lt;/p&gt;

&lt;p&gt;Our Auth Service handles coarse-grained access: does this identity have the level of access required to reach this endpoint category? That's it. It does not answer questions like "can this user delete this specific record?" or "does this account have permission to access this tenant's billing history?"&lt;/p&gt;

&lt;p&gt;Those are business policy questions. They belong in the services that own that data, where the full context exists.&lt;/p&gt;

&lt;p&gt;Every time we've been tempted to push business logic into the Auth Service — usually because it would be &lt;em&gt;convenient&lt;/em&gt;, or because a product requirement seemed auth-adjacent — we've regretted it. Business policy changes frequently. Auth infrastructure should be boring and stable. Keeping them separate means changes to one don't put the other at risk.&lt;/p&gt;

&lt;p&gt;The Auth Service also doesn't store sessions, doesn't issue tokens, and doesn't look up users. Tokens carry enough identity for upstream services to do that themselves. The Auth Service only validates.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern underneath all of this
&lt;/h2&gt;

&lt;p&gt;Looking back, the decisions that held up over time share a common shape: make the implicit explicit.&lt;/p&gt;

&lt;p&gt;Endpoint classification pulled auth rules out of code and into data. Decision reasons named every outcome instead of letting them collapse into status codes. The single log line made the request lifecycle visible as a single artifact instead of scattered fragments. The cache/decision boundary separated "what the token says" from "what we're going to do about it."&lt;/p&gt;

&lt;p&gt;None of these are particularly novel ideas. But they compound. A service where every decision is named, every outcome is logged atomically, and every boundary is deliberate is a service you can actually operate.&lt;/p&gt;

&lt;p&gt;That's the goal.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: Chapter 4 — the path trie that resolves incoming URIs to endpoint records in O(path length), without a database call on the hot path.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>jwt</category>
      <category>auth</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Part 1 — Why we built an Auth Gateway instead of putting auth in every service</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:39:30 +0000</pubDate>
      <link>https://forem.com/akarshan/why-we-built-an-auth-gateway-instead-of-putting-auth-in-every-service-36ca</link>
      <guid>https://forem.com/akarshan/why-we-built-an-auth-gateway-instead-of-putting-auth-in-every-service-36ca</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbklcavwvc1saex9e0fcy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbklcavwvc1saex9e0fcy.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've been on a platform team long enough, you've probably watched this slow-motion failure:&lt;/p&gt;

&lt;p&gt;You ship an auth library. Three services adopt it. Six months later, two of them are still on &lt;code&gt;v1.0&lt;/code&gt;, one forked it to add a custom claim, and a fourth service rolled its own because the library "didn't fit their use case." A CVE drops. Now you're hunting through repos to find every place that decodes a JWT.&lt;/p&gt;

&lt;p&gt;We've been running a multi-tenant platform on Kubernetes for a while, and we kept ending up there. So a couple of years ago we made a call: stop trying to &lt;em&gt;protect every service&lt;/em&gt; and start &lt;em&gt;making the decision once&lt;/em&gt; — at the edge.&lt;/p&gt;

&lt;p&gt;This is the first post in a 10-part series about that gateway. The actual gateway is two pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NGINX&lt;/strong&gt;, packaged as a Helm chart, that fronts every authenticated route.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth Service&lt;/strong&gt;, a small Go service that exposes a single &lt;code&gt;POST /auth&lt;/code&gt; endpoint. NGINX hits it as a subrequest on every protected request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll skip the marketing in this series. I'll show real code, real config, and the parts that hurt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision: three things, three different homes
&lt;/h2&gt;

&lt;p&gt;The mistake we kept making was treating "auth" as one thing. It's three:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt; — who is this caller? (&lt;code&gt;Authorization: Bearer ...&lt;/code&gt;, signature, expiry, revocation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization&lt;/strong&gt; — are they allowed to call &lt;em&gt;this&lt;/em&gt; endpoint? (role, access level, tenant)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing&lt;/strong&gt; — where does this request go? (multi-tenant DNS, single-tenant vs multi-tenant upstream, header propagation)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you put all three inside every service, every service ends up with its own opinion. So we split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NGINX&lt;/strong&gt; owns routing and fail-closed posture. It already sits in the request path. It's the cheapest place on earth to say "no."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth Service&lt;/strong&gt; owns the &lt;em&gt;decision&lt;/em&gt; — token validity, endpoint classification, access level check.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream services&lt;/strong&gt; own the business logic and trust the identity headers NGINX injects.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What this looks like on a single request
&lt;/h2&gt;

&lt;p&gt;Here's the happy path, as it actually runs in production. A user calls &lt;code&gt;GET /user-management/users/me&lt;/code&gt; with a bearer token and a tenant header.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgo93zjktkprbe31r0lj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgo93zjktkprbe31r0lj.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few things worth noticing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The subrequest is a &lt;code&gt;POST&lt;/code&gt;, not a &lt;code&gt;GET&lt;/code&gt;. NGINX's &lt;code&gt;auth_request&lt;/code&gt; always uses &lt;code&gt;proxy_method POST&lt;/code&gt; in our chart. The Auth Service doesn't need a body — it decides from &lt;code&gt;X-Original-URI&lt;/code&gt;, &lt;code&gt;X-Original-Method&lt;/code&gt;, &lt;code&gt;X-Tenant-ID&lt;/code&gt;, and the bearer token.&lt;/li&gt;
&lt;li&gt;The Auth Service responds with &lt;strong&gt;identity headers&lt;/strong&gt;. NGINX pulls them out with &lt;code&gt;auth_request_set&lt;/code&gt; and re-injects them into the upstream proxy call. Upstream services never look at the JWT — they trust &lt;code&gt;X-Identity-ID&lt;/code&gt; because they trust NGINX.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Request-ID&lt;/code&gt; is propagated end-to-end. Every log line on every hop carries the same id. (More on that in Chapter 9.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The deny path is where centralization actually pays off
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fza6egwoxttyan8ikhbi6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fza6egwoxttyan8ikhbi6.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two things that we got &lt;em&gt;for free&lt;/em&gt; the moment we centralized:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identical error envelopes&lt;/strong&gt;. Whether the failure is "no tenant header," "expired token," "wrong access level," or "the Auth Service itself is down," the client sees the same shape: &lt;code&gt;{"source":"auth","message":"...","code":"...","error":"..."}&lt;/code&gt;. We didn't have to coordinate this across 30 services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream services never run on a bad token&lt;/strong&gt;. They aren't even invoked. That alone fixed a long tail of "service X returned 200 with weird data because the token didn't validate but the framework didn't care."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The corresponding NGINX config is small and worth showing in full. Trimmed to the parts that matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# auth.conf — the subrequest endpoint&lt;/span&gt;
&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                                &lt;span class="c1"&gt;# only callable from auth_request&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_pass_request_body&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;# auth doesn't need the body&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_method&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Content-Length&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-URI&lt;/span&gt;    &lt;span class="nv"&gt;$request_uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-Method&lt;/span&gt; &lt;span class="nv"&gt;$request_method&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-Host&lt;/span&gt;   &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt;      &lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;       &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass_request_headers&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;# forward Authorization etc.&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_connect_timeout&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt;    &lt;span class="s"&gt;10s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream&lt;/span&gt;   &lt;span class="s"&gt;error&lt;/span&gt; &lt;span class="s"&gt;timeout&lt;/span&gt; &lt;span class="s"&gt;http_500&lt;/span&gt; &lt;span class="s"&gt;http_502&lt;/span&gt; &lt;span class="s"&gt;http_503&lt;/span&gt; &lt;span class="s"&gt;http_504&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream_tries&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream_timeout&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@auth_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;# fail-closed: 503 to the client&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://auth_service/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's how a regular service location &lt;em&gt;uses&lt;/em&gt; it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/user-management/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt; &lt;span class="s"&gt;"incore"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;$microservice&lt;/span&gt; &lt;span class="s"&gt;"user-service"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;auth_request&lt;/span&gt;     &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_id&lt;/span&gt;        &lt;span class="nv"&gt;$upstream_http_x_identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_type&lt;/span&gt;      &lt;span class="nv"&gt;$upstream_http_x_identity_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$session_id&lt;/span&gt;         &lt;span class="nv"&gt;$upstream_http_x_session_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_error_message&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_auth_error_message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;    &lt;span class="nv"&gt;$upstream_http_x_auth_error_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@unauthorized&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@forbidden&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@upstream_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-ID&lt;/span&gt;  &lt;span class="nv"&gt;$identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;    &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt;   &lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;rewrite&lt;/span&gt; &lt;span class="s"&gt;^/user-management/(.*)&lt;/span&gt;$ &lt;span class="n"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt; &lt;span class="s"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://user-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's roughly 25 lines per route, generated by a Helm &lt;code&gt;range&lt;/code&gt; over the services dictionary. Adding a new service is a values change — no NGINX expert required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why NGINX, specifically
&lt;/h2&gt;

&lt;p&gt;We didn't pick NGINX because of opinions. We picked it because of one directive: &lt;strong&gt;&lt;code&gt;auth_request&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;auth_request&lt;/code&gt; lets you tell NGINX: &lt;em&gt;before you proxy the main request, fire a subrequest to this internal location. If the subrequest returns 200, continue. If it returns 401 or 403, stop and run my error handler. If it returns 5xx, run my "auth is down" error handler.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That sounds boring. It's not. It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your upstream services don't see unauthenticated traffic at all.&lt;/li&gt;
&lt;li&gt;You can change auth logic by deploying one service. No client SDK update, no library bump in 30 repos.&lt;/li&gt;
&lt;li&gt;You get a single observable choke-point. &lt;code&gt;auth_request_time_ms&lt;/code&gt; is one log field. We graph it. We page on it.&lt;/li&gt;
&lt;li&gt;You can implement &lt;em&gt;fail-closed by default&lt;/em&gt; with one line: &lt;code&gt;error_page 502 503 504 = @auth_unavailable;&lt;/code&gt;. If the Auth Service is unhealthy, NGINX returns 503 to the client &lt;em&gt;instead of&lt;/em&gt; letting the request through. We pay this cost on purpose. Allowing traffic through a broken auth check is how data leaks happen.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll dissect &lt;code&gt;auth_request&lt;/code&gt; in Chapter 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Auth Service fits
&lt;/h2&gt;

&lt;p&gt;The Auth Service is intentionally small. It does five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads a few request headers.&lt;/li&gt;
&lt;li&gt;Resolves the tenant.&lt;/li&gt;
&lt;li&gt;Matches the request path to an &lt;em&gt;endpoint metadata record&lt;/em&gt; (we call this the &lt;strong&gt;trie&lt;/strong&gt; — Chapter 4).&lt;/li&gt;
&lt;li&gt;Classifies the endpoint as &lt;code&gt;OPEN&lt;/code&gt;, &lt;code&gt;AUTHENTICATED&lt;/code&gt;, or &lt;code&gt;ACCESS_CONTROLLED&lt;/code&gt; and runs the right validation pipeline.&lt;/li&gt;
&lt;li&gt;Emits exactly one structured &lt;code&gt;AUTH_DECISION&lt;/code&gt; log line with the timing, identity, decision reason, and outcome.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; store sessions. It does &lt;strong&gt;not&lt;/strong&gt; mint tokens (a separate service does). It does &lt;strong&gt;not&lt;/strong&gt; know about your business logic. It's a &lt;em&gt;policy decision point&lt;/em&gt; — the thing whose only job is to answer "yes or no, and why."&lt;/p&gt;

&lt;p&gt;Here's the controller, paraphrased:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;AuthController&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Auth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gin&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetAuthDecisionLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URI&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-Original-URI"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-Original-Method"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TenantID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X-Tenant-ID"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trieKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;perms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trieExists&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;routeResolver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;ResolveEndpointWithMetrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;trieExists&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* 503: trie not initialized */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;found&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c"&gt;/* 404: no such API found  */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;endpointType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;perms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;OPEN&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DecisionReason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReasonOpenEndpoint&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"allow"&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;AUTHENTICATED&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runAuthN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;ACCESS_CONTROLLED&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runAuthN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runAuthZ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;perms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five branches. That's the entire shape of the gateway. The next nine posts are about what's &lt;em&gt;inside&lt;/em&gt; each branch and what we learned operating it at ~50k RPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  What centralizing actually costs
&lt;/h2&gt;

&lt;p&gt;I'd be lying if I said this was free. Three real costs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One extra hop on the hot path.&lt;/strong&gt; Every authenticated request now does an in-cluster RPC to the Auth Service before it goes anywhere. We make this cheap with caching (Chapter 8) and with Ristretto-backed JWT verification, but the hop is still there. Median &lt;code&gt;auth_request_time_ms&lt;/code&gt; is in the low single digits in our environment, but it's a budget you have to keep.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Auth Service has to be HA.&lt;/strong&gt; When it's down, &lt;em&gt;everything&lt;/em&gt; is 503. We chose fail-closed on purpose — a permissive default would mean unauthenticated traffic could hit business services during an outage — but it raises the bar on availability. We run it with an HPA (2–10 replicas, 75% CPU/mem), keep alive 64 connections per worker, and gate readiness on the trie being loaded. Even with that, the Auth Service is the single most carefully operated thing on the platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;auth_request&lt;/code&gt; is not cached.&lt;/strong&gt; This surprises people. NGINX does &lt;em&gt;not&lt;/em&gt; cache auth subrequest responses by default, and the obvious caching workarounds (caching by &lt;code&gt;Authorization&lt;/code&gt; header) are dangerous in a multi-tenant world. Every protected request hits the auth pod. So everything inside the auth pod has to be fast. That constraint is what shaped the entire internal design — and is why Chapters 7 and 8 spend a lot of time on caches that live &lt;em&gt;inside&lt;/em&gt; the auth process, not in NGINX.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before vs after, at a glance
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    subgraph Before["Before — auth in every service"]
      C1[Client] --&amp;gt; S1[Service A&amp;lt;br/&amp;gt;auth lib v1.2] &amp;amp; S2[Service B&amp;lt;br/&amp;gt;auth lib v1.0] &amp;amp; S3[Service C&amp;lt;br/&amp;gt;custom auth]
    end
    subgraph After["After — Auth Gateway"]
      C2[Client] --&amp;gt; NX[NGINX&amp;lt;br/&amp;gt;auth_request] --&amp;gt; AU[Auth Service]
      NX --&amp;gt; SA[Service A] &amp;amp; SB[Service B] &amp;amp; SC[Service C]
    end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's coming
&lt;/h2&gt;

&lt;p&gt;This series moves from primitive to production-grade:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 2&lt;/strong&gt; — &lt;code&gt;auth_request&lt;/code&gt; in depth. Subrequest lifecycle, &lt;code&gt;auth_request_set&lt;/code&gt;, named &lt;code&gt;error_page&lt;/code&gt; locations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 3&lt;/strong&gt; — inside the Auth Service. JWT validation, the per-tenant RSA key cache, the decision-reason model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 4&lt;/strong&gt; — endpoint classification (OPEN / AUTHENTICATED / ACCESS_CONTROLLED) and the trie that drives it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 5&lt;/strong&gt; — multi-tenant routing. SNI server blocks, &lt;code&gt;X-Tenant-ID&lt;/code&gt; vs &lt;code&gt;X-Tenant-Host&lt;/code&gt;, MT vs ST upstreams, and why we 400 on no tenant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 6&lt;/strong&gt; — authorization at scale. Role → access level → endpoint, and the bitmap fast path that replaced string-set matching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 7&lt;/strong&gt; — token revocation without killing performance. Redis ZSET + Stream + local cache, the race-condition fixes, and service-account rotation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 8&lt;/strong&gt; — every cache in the hot path and how each one is invalidated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 9&lt;/strong&gt; — operating the gateway. The &lt;code&gt;AUTH_DECISION&lt;/code&gt; log, OpenTelemetry, health probes, and degraded-mode alerts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 10&lt;/strong&gt; — what we'd build differently on day one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chapter 2 is up next: &lt;code&gt;auth_request&lt;/code&gt; is a 12-character directive that quietly does most of the work in this post. I want to show you exactly &lt;em&gt;why&lt;/em&gt; it's the right primitive — and what its sharp edges are.&lt;/p&gt;

&lt;p&gt;If you're working on something similar and want to compare notes, drop a comment. We made plenty of mistakes; happy to share which ones bit hardest.&lt;/p&gt;

</description>
      <category>nginx</category>
      <category>kubernetes</category>
      <category>auth</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Part 2 — NGINX auth_request: the small primitive that changed everything</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Mon, 04 May 2026 18:38:31 +0000</pubDate>
      <link>https://forem.com/akarshan/nginx-authrequest-the-small-primitive-that-changed-everything-2pfa</link>
      <guid>https://forem.com/akarshan/nginx-authrequest-the-small-primitive-that-changed-everything-2pfa</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxmi1liyw7oodhb99lu7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxmi1liyw7oodhb99lu7.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Chapter 1 I claimed our entire Auth Gateway is built on top of one NGINX directive: &lt;code&gt;auth_request&lt;/code&gt;. This chapter is a deep dive into how that directive actually works, and the four or five sharp edges that bit us before we got the config right.&lt;/p&gt;

&lt;p&gt;If you already know &lt;code&gt;auth_request&lt;/code&gt; cold, skim to "Sharp edge 1" near the bottom — that's where the real war stories are.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;auth_request&lt;/code&gt; actually does
&lt;/h2&gt;

&lt;p&gt;Drop this in a &lt;code&gt;location&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/user-management/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://user-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a request matches &lt;code&gt;/user-management/&lt;/code&gt;, NGINX:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pauses the main request &lt;em&gt;before&lt;/em&gt; doing anything to the upstream.&lt;/li&gt;
&lt;li&gt;Fires an internal &lt;strong&gt;subrequest&lt;/strong&gt; to &lt;code&gt;/auth&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Looks at the subrequest's HTTP status:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;2xx&lt;/code&gt; → continue with the main request.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;401&lt;/code&gt; or &lt;code&gt;403&lt;/code&gt; → abort the main request and return that status to the client.&lt;/li&gt;
&lt;li&gt;Anything else → fall through to your &lt;code&gt;error_page&lt;/code&gt; directives, or return 500.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the entire surface area. Two things to internalize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The subrequest never reaches the client. The client only sees the &lt;em&gt;result&lt;/em&gt; — usually a 200 from your upstream, or a 401 NGINX synthesized.&lt;/li&gt;
&lt;li&gt;The subrequest target is just a normal &lt;code&gt;location&lt;/code&gt; block. Any NGINX feature works there: &lt;code&gt;proxy_pass&lt;/code&gt;, timeouts, retries, keepalive pools, even another &lt;code&gt;auth_request&lt;/code&gt; (don't do this).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The subrequest lifecycle, as a timeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pg4tp1xl573j3qp6jrt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pg4tp1xl573j3qp6jrt.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice the order: the subrequest is fully &lt;em&gt;finished&lt;/em&gt; before NGINX touches the upstream. There is no streaming, no overlap. That's why latency adds up — your auth time is purely sequential to your upstream time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the subrequest actually looks like
&lt;/h2&gt;

&lt;p&gt;Here is the full &lt;code&gt;auth.conf&lt;/code&gt; we ship in our Helm chart, trimmed of comments and noise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                          &lt;span class="c1"&gt;# not callable from outside&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_pass_request_body&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;# Critical: don't ship the body&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Content-Length&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_method&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-URI&lt;/span&gt;    &lt;span class="nv"&gt;$request_uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-Method&lt;/span&gt; &lt;span class="nv"&gt;$request_method&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-Host&lt;/span&gt;   &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Original-URL&lt;/span&gt;    &lt;span class="nv"&gt;$scheme&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;//&lt;/span&gt;&lt;span class="nv"&gt;$http_host$request_uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt;      &lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;       &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass_request_headers&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;# forward Authorization, cookies, etc.&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_buffering&lt;/span&gt;        &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_http_version&lt;/span&gt;     &lt;span class="mf"&gt;1.1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt;       &lt;span class="s"&gt;Connection&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_connect_timeout&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_send_timeout&lt;/span&gt;    &lt;span class="s"&gt;10s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt;    &lt;span class="s"&gt;10s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream&lt;/span&gt;   &lt;span class="s"&gt;error&lt;/span&gt; &lt;span class="s"&gt;timeout&lt;/span&gt; &lt;span class="s"&gt;invalid_header&lt;/span&gt; &lt;span class="s"&gt;http_500&lt;/span&gt; &lt;span class="s"&gt;http_502&lt;/span&gt; &lt;span class="s"&gt;http_503&lt;/span&gt; &lt;span class="s"&gt;http_504&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream_tries&lt;/span&gt;   &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_next_upstream_timeout&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@auth_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://auth_service/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every line in there is the result of an outage post-mortem. Worth walking through the non-obvious ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;internal;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Without this, &lt;code&gt;/auth&lt;/code&gt; would be a public endpoint anyone could hit. With it, NGINX only allows the location to be called from a subrequest. Try &lt;code&gt;curl https://your-host/auth&lt;/code&gt; and you get 404. This is the same pattern NGINX uses for its own named locations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;proxy_method POST&lt;/code&gt; + &lt;code&gt;proxy_pass_request_body off&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The Auth Service doesn't care about the request body. It cares about the URI, the method, the tenant, and the bearer token. So we strip the body and force &lt;code&gt;POST&lt;/code&gt;. Two reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance.&lt;/strong&gt; A 50 MB upload would otherwise be buffered to the auth subrequest before it could be streamed to the upstream. That's a non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security.&lt;/strong&gt; The Auth Service shouldn't be a side-channel exfiltration target for upstream payloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But we're forcing &lt;code&gt;POST&lt;/code&gt; even though we drop the body. Why? Because some load balancers and observability tools treat &lt;code&gt;POST /auth&lt;/code&gt; differently from &lt;code&gt;GET /auth&lt;/code&gt;, and we wanted the subrequest to be obviously a &lt;em&gt;write&lt;/em&gt; of a decision request, not a read.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;proxy_pass_request_headers on&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The Auth Service needs &lt;code&gt;Authorization&lt;/code&gt;, &lt;code&gt;Cookie&lt;/code&gt;, &lt;code&gt;X-Forwarded-For&lt;/code&gt;, etc. We pass them all. The subrequest is in-cluster — there's no trust boundary between NGINX and the Auth Service.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;proxy_set_header X-Original-*&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;NGINX rewrites the URI of a subrequest to the subrequest target (&lt;code&gt;/auth&lt;/code&gt;). The Auth Service has no idea what URL the client originally hit. So we explicitly forward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;X-Original-URI&lt;/code&gt; — the path with query string, used for endpoint matching and audit.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Original-Method&lt;/code&gt; — the original HTTP verb.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Original-Host&lt;/code&gt; — the host header, useful for tenant resolution by hostname.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-Original-URL&lt;/code&gt; — full URL, for logs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These headers are the contract between NGINX and the Auth Service. Change them carelessly and you break every auth decision in the platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;proxy_buffering off&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;For a tiny 2-line JSON response, buffering hurts more than it helps. We get a few hundred microseconds back per request with this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;proxy_http_version 1.1&lt;/code&gt; + &lt;code&gt;Connection ""&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Combined with the upstream's &lt;code&gt;keepalive 64&lt;/code&gt;, this enables connection reuse between NGINX and the Auth Service. Without it, every subrequest opens a fresh TCP connection — disastrous at any real RPS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeouts and retries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_connect_timeout&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_send_timeout&lt;/span&gt;    &lt;span class="s"&gt;10s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_read_timeout&lt;/span&gt;    &lt;span class="s"&gt;10s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_next_upstream_tries&lt;/span&gt;   &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_next_upstream_timeout&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Translation: try to connect in 5s, send in 10s, read in 10s. If we get a connection error or 5xx, retry once. The whole thing is bounded at 15s.&lt;/p&gt;

&lt;p&gt;These are huge ceilings — a healthy auth pod responds in single-digit milliseconds. They exist for the &lt;em&gt;worst&lt;/em&gt; case: a partition, a failing pod, a slow JWKS fetch on first request. We'd rather wait 15 seconds and serve a clean 503 than time out at 1 second and have flaky behavior under load.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;error_page 502 503 504 = @auth_unavailable&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the fail-closed line. If the Auth Service is unreachable after retries, NGINX runs the &lt;code&gt;@auth_unavailable&lt;/code&gt; named location instead of just 502'ing the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pulling identity out: &lt;code&gt;auth_request_set&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;A subrequest succeeding (200) tells NGINX to continue, but on its own it doesn't tell the upstream service &lt;em&gt;who&lt;/em&gt; is calling. That's where &lt;code&gt;auth_request_set&lt;/code&gt; comes in. It pulls headers off the subrequest's response and binds them to NGINX variables, which we then forward.&lt;/p&gt;

&lt;p&gt;From &lt;code&gt;locations.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_time&lt;/span&gt;     &lt;span class="nv"&gt;$upstream_response_time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_status&lt;/span&gt;   &lt;span class="nv"&gt;$upstream_status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_id&lt;/span&gt;   &lt;span class="nv"&gt;$upstream_http_x_identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_type&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_identity_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_name&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_identity_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$session_id&lt;/span&gt;    &lt;span class="nv"&gt;$upstream_http_x_session_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_error_message&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_auth_error_message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;    &lt;span class="nv"&gt;$upstream_http_x_auth_error_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two patterns at play:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;$upstream_response_time&lt;/code&gt; and &lt;code&gt;$upstream_status&lt;/code&gt; are the auth subrequest's &lt;em&gt;transport&lt;/em&gt; metadata. We capture them so they end up in our log line.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;$upstream_http_x_identity_id&lt;/code&gt; is NGINX's way of saying "the value of the &lt;code&gt;X-Identity-ID&lt;/code&gt; response header on the most recent upstream call." We freeze that into &lt;code&gt;$identity_id&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; we touch the actual upstream service — otherwise the upstream's response headers would clobber it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, in the same &lt;code&gt;location&lt;/code&gt;, we pass those variables forward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-ID&lt;/span&gt;   &lt;span class="nv"&gt;$identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-Type&lt;/span&gt; &lt;span class="nv"&gt;$identity_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Identity-Name&lt;/span&gt; &lt;span class="nv"&gt;$identity_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Session-ID&lt;/span&gt;    &lt;span class="nv"&gt;$session_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Tenant-ID&lt;/span&gt;     &lt;span class="nv"&gt;$tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt;    &lt;span class="nv"&gt;$request_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The upstream service trusts these. It doesn't see the JWT. It doesn't validate the token. It loads the user by &lt;code&gt;X-Identity-ID&lt;/code&gt; and gets on with its life.&lt;/p&gt;

&lt;h2&gt;
  
  
  Named error_page locations: clean envelopes for every failure
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;auth_request&lt;/code&gt; returning 401 doesn't automatically send a clean 401 to the client — it just tells NGINX the request was unauthorized. By default the response body is empty, which makes for sad logs and worse client behavior.&lt;/p&gt;

&lt;p&gt;We use named &lt;code&gt;error_page&lt;/code&gt; locations to attach JSON envelopes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/user-management/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@unauthorized&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@forbidden&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@internal_server_error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@upstream_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://user-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the named locations live in &lt;code&gt;custom_error_locations.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="s"&gt;@unauthorized&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;default_type&lt;/span&gt; &lt;span class="nc"&gt;application/json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;add_header&lt;/span&gt; &lt;span class="s"&gt;X-Request-ID&lt;/span&gt; &lt;span class="nv"&gt;$request_id&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kn"&gt;"source":"auth","message":"Unauthorized","code":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;&lt;span class="s"&gt;","error":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_message&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="s"&gt;@forbidden&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;default_type&lt;/span&gt; &lt;span class="nc"&gt;application/json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;if&lt;/span&gt; &lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"No&lt;/span&gt; &lt;span class="s"&gt;such&lt;/span&gt; &lt;span class="s"&gt;API&lt;/span&gt; &lt;span class="s"&gt;found")&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kn"&gt;"source":"auth","message":"NotFound","code":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;&lt;span class="s"&gt;","error":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_message&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kn"&gt;"source":"auth","message":"Forbidden","code":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_code&lt;/span&gt;&lt;span class="s"&gt;","error":"&lt;/span&gt;&lt;span class="nv"&gt;$auth_error_message&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="s"&gt;@auth_unavailable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;default_type&lt;/span&gt; &lt;span class="nc"&gt;application/json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;access_log&lt;/span&gt; &lt;span class="n"&gt;/dev/stdout&lt;/span&gt; &lt;span class="s"&gt;auth_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kn"&gt;"source":"auth","message":"Auth&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt; &lt;span class="s"&gt;Unavailable","error":"Auth&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt; &lt;span class="s"&gt;unreachable"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="s"&gt;@upstream_unavailable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;default_type&lt;/span&gt; &lt;span class="nc"&gt;application/json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;access_log&lt;/span&gt; &lt;span class="n"&gt;/dev/stdout&lt;/span&gt; &lt;span class="s"&gt;upstream_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="kn"&gt;"source":"auth","message":"Upstream&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt; &lt;span class="s"&gt;Unavailable","error":"Upstream&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt; &lt;span class="s"&gt;unreachable"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Couple of things worth highlighting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;$auth_error_code&lt;/code&gt; and &lt;code&gt;$auth_error_message&lt;/code&gt; were captured from the subrequest's &lt;code&gt;X-Auth-Error-Code&lt;/code&gt; and &lt;code&gt;X-Auth-Error-Message&lt;/code&gt; response headers. The Auth Service writes these on every deny, and NGINX surfaces them verbatim to the client.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;if&lt;/code&gt; inside &lt;code&gt;@forbidden&lt;/code&gt; is how we handle the "endpoint not registered in the trie" case. The Auth Service signals it with a 403 + a specific message, and NGINX rewrites that to 404. The wire-level shape stays consistent, but the status reflects what the client should actually see.&lt;/li&gt;
&lt;li&gt;Both fail-closed branches use a &lt;em&gt;separate&lt;/em&gt; log format (&lt;code&gt;auth_unavailable&lt;/code&gt;, &lt;code&gt;upstream_unavailable&lt;/code&gt;). When something is on fire, you want it in its own log stream so dashboards aren't drowned by 200s.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Sharp edge 1: subrequests don't cache
&lt;/h2&gt;

&lt;p&gt;People expect this to work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_cache&lt;/span&gt; &lt;span class="s"&gt;auth_cache&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;proxy_cache_key&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$http_authorization&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It doesn't. &lt;code&gt;auth_request&lt;/code&gt; deliberately ignores &lt;code&gt;proxy_cache&lt;/code&gt; — the subrequest fires every time. There's no built-in TTL on auth decisions.&lt;/p&gt;

&lt;p&gt;Why is that the right default? Because auth decisions are not cacheable in the general case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The same token might be revoked in the next 50 ms.&lt;/li&gt;
&lt;li&gt;The required permissions for an endpoint might change.&lt;/li&gt;
&lt;li&gt;The tenant context can change between requests (different &lt;code&gt;X-Tenant-ID&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You &lt;em&gt;can&lt;/em&gt; roll your own cache — for example, by keying off &lt;code&gt;(token_hash, endpoint, method)&lt;/code&gt; and storing decisions in a shared cache — but you're now responsible for invalidating it when &lt;em&gt;anything&lt;/em&gt; about the auth state changes. We chose a different approach: caching inside the Auth Service process itself. That's Chapter 8.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharp edge 2: &lt;code&gt;auth_request_set&lt;/code&gt; is run in main-request context
&lt;/h2&gt;

&lt;p&gt;This bit us on day three. Consider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_id&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://api-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The variable &lt;code&gt;$identity_id&lt;/code&gt; is &lt;em&gt;not&lt;/em&gt; set when the subrequest returns. It's set when &lt;code&gt;auth_request_set&lt;/code&gt; is &lt;em&gt;evaluated&lt;/em&gt; in the main request — which happens &lt;em&gt;after&lt;/em&gt; the subrequest, but &lt;code&gt;$upstream_http_x_identity_id&lt;/code&gt; refers to the most recent upstream response in the &lt;em&gt;current request context&lt;/em&gt;. Since the subrequest just finished, this works. But here's the trap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://api-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;# ❌ auth_request_set lives below proxy_pass&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request_set&lt;/span&gt; &lt;span class="nv"&gt;$identity_id&lt;/span&gt; &lt;span class="nv"&gt;$upstream_http_x_identity_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;auth_request_set&lt;/code&gt; directives are &lt;em&gt;order-independent within a location&lt;/em&gt; (they apply at request setup), but if you start playing tricks with &lt;code&gt;if&lt;/code&gt; or &lt;code&gt;set&lt;/code&gt;-based conditionals, you can read &lt;code&gt;$identity_id&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; &lt;code&gt;auth_request_set&lt;/code&gt; evaluates and get an empty string. Lesson: keep &lt;code&gt;auth_request_set&lt;/code&gt; together, immediately after &lt;code&gt;auth_request&lt;/code&gt;, before any &lt;code&gt;proxy_set_header&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharp edge 3: subrequest 5xx vs subrequest 401
&lt;/h2&gt;

&lt;p&gt;A subtle one. If the Auth Service returns 401, the client sees 401. If the Auth Service returns 500, what does the client see?&lt;/p&gt;

&lt;p&gt;By default: 500. Because &lt;code&gt;auth_request&lt;/code&gt; propagates the subrequest's status if it's not 2xx and not 401/403.&lt;/p&gt;

&lt;p&gt;That's almost never what you want. A 500 from the auth pod is "auth is broken," not "the user is broken." The client shouldn't see "internal server error" for what is operationally an auth outage.&lt;/p&gt;

&lt;p&gt;Fix: explicit &lt;code&gt;error_page&lt;/code&gt; mapping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@internal_server_error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;error_page&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt; &lt;span class="mi"&gt;504&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;@auth_unavailable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now any 5xx from the auth subrequest gets a clean envelope. We tell oncall via the alert pipeline (Chapter 9), not via the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharp edge 4: &lt;code&gt;proxy_intercept_errors&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Default is &lt;code&gt;off&lt;/code&gt;, which is &lt;em&gt;correct&lt;/em&gt; in our location blocks. We explicitly set it because we burned half a day on a related bug:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kn"&gt;auth_request&lt;/span&gt; &lt;span class="n"&gt;/auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_intercept_errors&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;# important&lt;/span&gt;
  &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://api-service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you set &lt;code&gt;proxy_intercept_errors on&lt;/code&gt;, NGINX will run &lt;em&gt;upstream&lt;/em&gt; error responses (e.g., a 404 from the actual &lt;code&gt;api-service&lt;/code&gt;) through your &lt;code&gt;error_page&lt;/code&gt; mappings. Suddenly your "no such API found" 404 from the Auth Service and a "user not found" 404 from the upstream both end up in &lt;code&gt;@forbidden&lt;/code&gt;'s 404 branch. They look identical to the client. They're completely different problems.&lt;/p&gt;

&lt;p&gt;Keep &lt;code&gt;proxy_intercept_errors off&lt;/code&gt; on the upstream location. Let upstream errors pass through unmolested. Only &lt;em&gt;auth-side&lt;/em&gt; errors should run through your named locations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharp edge 5: NGINX never sees auth's body
&lt;/h2&gt;

&lt;p&gt;The Auth Service can't return a JSON body that NGINX uses. Only the &lt;em&gt;status code&lt;/em&gt; and &lt;em&gt;response headers&lt;/em&gt; matter. If the Auth Service writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;HTTP/&lt;/span&gt;&lt;span class="mf"&gt;1.1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Unauthorized&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;X-Auth-Error-Code:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;TOKEN_EXPIRED&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;X-Auth-Error-Message:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expired&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"detail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"token expired"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…NGINX sees the 401 and the two &lt;code&gt;X-Auth-*&lt;/code&gt; headers. The body is discarded. So the contract is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Status&lt;/strong&gt; decides allow/deny/error.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response headers&lt;/strong&gt; carry identity (on 200) or failure metadata (on 4xx).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response body&lt;/strong&gt; is for nobody. Don't bother.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Internalize this and the Auth Service handler design becomes much simpler — it's writing headers, not JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this directive bought us
&lt;/h2&gt;

&lt;p&gt;To put it bluntly: &lt;code&gt;auth_request&lt;/code&gt; is the difference between "we operate an Auth Gateway" and "we operate an Auth &lt;em&gt;Library That Every Service Includes&lt;/em&gt;." It moved the decision point off every service's hot path and onto a single dedicated pod. Everything else in this series — endpoint classification, multi-tenant routing, revocation, observability — sits on top of that one primitive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A[auth subrequest returns] --&amp;gt; B{status}
    B --&amp;gt;|200| C[continue to proxy_pass]
    B --&amp;gt;|401| D["@unauthorized&amp;lt;br/&amp;gt;return 401 JSON"]
    B --&amp;gt;|403| E["@forbidden&amp;lt;br/&amp;gt;404 if 'No such API found'&amp;lt;br/&amp;gt;else 403"]
    B --&amp;gt;|500| F["@internal_server_error&amp;lt;br/&amp;gt;return 500 JSON"]
    B --&amp;gt;|502/503/504| G["@auth_unavailable&amp;lt;br/&amp;gt;return 503 JSON&amp;lt;br/&amp;gt;(fail-closed)"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Chapter 3 goes inside the Auth Service: the controller, the handler chain, JWT validation, the per-tenant RSA public-key cache, and the decision-reason model. We'll spend most of it in Go code.&lt;/p&gt;

&lt;p&gt;If you implement an &lt;code&gt;auth_request&lt;/code&gt;-backed gateway after reading this and a bit catches you, drop a comment. The five sharp edges above are the ones we hit. There are probably another five waiting for you.&lt;/p&gt;

</description>
      <category>nginx</category>
      <category>kubernetes</category>
      <category>webdev</category>
      <category>auth</category>
    </item>
    <item>
      <title>I Dug Up My 10-Year-Old Android App, Dusted It Off With AI, and Put It Back on the Play Store</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Sun, 22 Feb 2026 06:44:45 +0000</pubDate>
      <link>https://forem.com/akarshan/i-dug-up-my-10-year-old-android-app-dusted-it-off-with-ai-and-put-it-back-on-the-play-store-4a3c</link>
      <guid>https://forem.com/akarshan/i-dug-up-my-10-year-old-android-app-dusted-it-off-with-ai-and-put-it-back-on-the-play-store-4a3c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2f75kytgn8vdsqcwtz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2f75kytgn8vdsqcwtz4.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last week I did something I didn't expect to enjoy as much as I did — I resurrected a side project I hadn't touched in a decade.&lt;/p&gt;

&lt;p&gt;Meet &lt;strong&gt;Drivelert&lt;/strong&gt;: an anti-drowsiness app I built ages ago to help drivers stay alert on long trips. The Play Store had quietly pulled it down due to ancient target SDK, zero maintenance, the usual graveyard story. I'd moved on. The app hadn't.&lt;/p&gt;

&lt;p&gt;Then, for reasons I can't fully explain (nostalgia? a slow weekend? some stubborn refusal to let past-me's work die?), I decided to bring it back.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Even Was This Thing?
&lt;/h2&gt;

&lt;p&gt;Drivelert was a simple but genuinely useful idea: monitor signs of driver fatigue and alert them before it becomes dangerous. I built it back when I was younger, more idealistic, and apparently not bothered by shipping something and completely abandoning it.&lt;/p&gt;

&lt;p&gt;The code was... a time capsule. Deprecated APIs, patterns I wouldn't touch today, some choices that made me genuinely wince. But the &lt;em&gt;idea&lt;/em&gt; was solid. The bones were good.&lt;/p&gt;




&lt;h2&gt;
  
  
  A decade later: smarter, more experienced, and AI finally made the revival real
&lt;/h2&gt;

&lt;p&gt;This time around, I wasn't flying solo. I brought AI into the workflow for unpicking outdated patterns, modernizing chunks of logic, and accelerating the parts that would've taken me days to slog through manually. &lt;/p&gt;

&lt;p&gt;It's a weird experience, honestly. You're reading code written by a younger version of yourself, and then having an AI help you translate it into something modern. Felt a bit like co-writing a letter to the past.&lt;/p&gt;

&lt;p&gt;What made it work was the mix of experience and capability.&lt;br&gt;
I had the wisdom and context of why things were built a certain way. AI brought the momentum and precision to rebuild them better.&lt;br&gt;
That blend of hindsight, experience, and AI support turned out to be surprisingly powerful.&lt;/p&gt;




&lt;h2&gt;
  
  
  It's Live Again
&lt;/h2&gt;

&lt;p&gt;After a week of evenings, Drivelert is back up on the Play Store:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://play.google.com/store/apps/details?id=com.drivelert.app" rel="noopener noreferrer"&gt;Drivelert on Google Play&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's something quietly satisfying about seeing an old project breathe again not just preserved, but actually improved. Younger-me would probably be pleased. Slightly jealous of the tools available now, but pleased.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Old side projects aren't necessarily dead, sometimes they just need the right moment and a better toolkit.&lt;/li&gt;
&lt;li&gt;AI assistance genuinely changes the calculus on revival projects. The "is this worth the effort?" math shifts when you can move faster.&lt;/li&gt;
&lt;li&gt;Shipping something imperfect 10 years ago &amp;gt; never shipping the perfect version. Past-me understood this intuitively. I'd forgotten.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;It's a second innings for a scrappy little app. Let's see how it goes. 🏏&lt;/p&gt;

</description>
      <category>ai</category>
      <category>android</category>
      <category>showdev</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>Testing Redis Circuit Breaker with Toxiproxy</title>
      <dc:creator>Akarshan Gandotra</dc:creator>
      <pubDate>Tue, 03 Feb 2026 07:28:46 +0000</pubDate>
      <link>https://forem.com/akarshan/testing-redis-circuit-breaker-with-toxiproxy-4p8a</link>
      <guid>https://forem.com/akarshan/testing-redis-circuit-breaker-with-toxiproxy-4p8a</guid>
      <description>&lt;p&gt;Building resilient distributed systems requires thorough testing of failure scenarios. While unit tests are great for business logic, they can't simulate the complex network failures that happen in production. This is where &lt;strong&gt;Toxiproxy&lt;/strong&gt; comes in—a powerful tool for testing how your application handles real-world network chaos.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll explore how to test a Redis circuit breaker implementation using Toxiproxy to simulate various failure modes, from complete outages to subtle network degradation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Toxiproxy?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Shopify/toxiproxy" rel="noopener noreferrer"&gt;Toxiproxy&lt;/a&gt; is a TCP proxy developed by Shopify that simulates network and system conditions for chaos and resiliency testing. Unlike traditional testing approaches that mock dependencies, Toxiproxy sits between your application and its dependencies, allowing you to inject real network failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection failures&lt;/strong&gt; (service down)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt; (slow networks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth limitations&lt;/strong&gt; (network congestion)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection resets&lt;/strong&gt; (unstable networks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data corruption&lt;/strong&gt; (packet loss)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it ideal for testing circuit breakers, retry logic, timeouts, and other resilience patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Circuit Breakers
&lt;/h2&gt;

&lt;p&gt;Before we dive into testing, let's briefly review the circuit breaker pattern. A circuit breaker prevents an application from repeatedly trying to execute an operation that's likely to fail, giving the failing service time to recover.&lt;/p&gt;

&lt;p&gt;The circuit breaker has three states:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Closed&lt;/strong&gt;: Normal operation, requests pass through&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open&lt;/strong&gt;: Too many failures detected, requests are blocked or fail-fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Half-Open&lt;/strong&gt;: Testing if the service has recovered&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For our Redis implementation, we'll configure the circuit breaker to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open after 5 consecutive failures&lt;/li&gt;
&lt;li&gt;Either fail-open (allow requests through) or fail-closed (block requests)&lt;/li&gt;
&lt;li&gt;Log when state transitions occur&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installing Toxiproxy
&lt;/h3&gt;

&lt;p&gt;First, install Toxiproxy on your system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;toxiproxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Starting the Toxiproxy Server
&lt;/h3&gt;

&lt;p&gt;Start the Toxiproxy server in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-server &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server starts on &lt;code&gt;localhost:8474&lt;/code&gt; by default. You can verify it's running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8474/version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating a Proxy for Redis
&lt;/h3&gt;

&lt;p&gt;Now create a proxy that sits between your auth service and Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create proxy: listens on 6380, forwards to Redis on 6379&lt;/span&gt;
toxiproxy-cli create redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-l&lt;/span&gt; localhost:6380 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; localhost:6379
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a proxy named &lt;code&gt;redis-proxy&lt;/code&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Listens on port &lt;strong&gt;6380&lt;/strong&gt; (your application will connect here)&lt;/li&gt;
&lt;li&gt;Forwards traffic to Redis on port &lt;strong&gt;6379&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verify the proxy was created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name        Listen          Upstream        Enabled
============================================================
redis-proxy localhost:6380  localhost:6379  true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring Your Application
&lt;/h3&gt;

&lt;p&gt;Point your auth service to use the Toxiproxy port instead of connecting directly to Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REDIS_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REDIS_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6380  &lt;span class="c"&gt;# Toxiproxy port, not 6379&lt;/span&gt;

&lt;span class="c"&gt;# Restart your service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now all Redis traffic flows through Toxiproxy, allowing you to inject failures without modifying your application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Scenarios
&lt;/h2&gt;

&lt;p&gt;Let's explore different failure scenarios and verify our circuit breaker behaves correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: Simulate Redis Down (Circuit Opens)
&lt;/h3&gt;

&lt;p&gt;The most critical test—what happens when Redis becomes completely unavailable?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disable the proxy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toggle redis-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simulates Redis being down. Your application will start getting connection failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First 4 requests fail but circuit remains closed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5th consecutive failure&lt;/strong&gt; → Circuit breaker opens&lt;/li&gt;
&lt;li&gt;Logs should show:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   {"level":"warn","msg":"Circuit breaker opened - Redis failures exceeded threshold","service":"auth"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Subsequent requests are either:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fail-open&lt;/strong&gt;: Allowed through (degraded mode, no Redis caching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail-closed&lt;/strong&gt;: Rejected immediately (fail-fast)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Monitoring the circuit state:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the circuit is open, check your application metrics or logs. The circuit should remain open for the configured recovery period.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Re-enable the proxy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toggle redis-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After re-enabling, the circuit should transition to &lt;strong&gt;half-open&lt;/strong&gt; on the next request, then back to &lt;strong&gt;closed&lt;/strong&gt; if the request succeeds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification checklist:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Circuit opens after 5 failures&lt;/li&gt;
&lt;li&gt;✅ Log message appears with correct timestamp&lt;/li&gt;
&lt;li&gt;✅ Requests are handled according to fail-open/fail-closed policy&lt;/li&gt;
&lt;li&gt;✅ Circuit recovers when service returns&lt;/li&gt;
&lt;li&gt;✅ Application continues functioning (degraded or failed requests)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 2: Add Latency (Slow Redis)
&lt;/h3&gt;

&lt;p&gt;Network latency is often more insidious than complete failures. It can cause timeouts, thread pool exhaustion, and cascading failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add 5-second latency:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic add redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This adds a 5000ms (5 second) delay to all requests passing through the proxy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your Redis client timeout is less than 5 seconds (e.g., 2 seconds), requests will timeout and count as failures. After 5 consecutive timeouts, the circuit should open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logs you should see:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"level":"error","msg":"Redis operation timeout","error":"context deadline exceeded"}
{"level":"warn","msg":"Circuit breaker opened - Redis failures exceeded threshold"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Testing different latency levels:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Moderate latency (1 second)&lt;/span&gt;
toxiproxy-cli toxic update redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; latency_downstream &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000

&lt;span class="c"&gt;# Extreme latency (10 seconds)&lt;/span&gt;
toxiproxy-cli toxic update redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; latency_downstream &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Remove the latency toxic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; latency_downstream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency above your timeout threshold behaves like a failure&lt;/li&gt;
&lt;li&gt;Helps verify your timeouts are properly configured&lt;/li&gt;
&lt;li&gt;Tests thread pool behavior under slow dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 3: Connection Reset (Network Errors)
&lt;/h3&gt;

&lt;p&gt;Simulate unstable network connections that reset mid-request:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add reset_peer toxic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic add redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; reset_peer &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nb"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This closes the connection 500ms after receiving data, simulating network instability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Connections will be abruptly closed, causing errors like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connection reset by peer
unexpected EOF
broken pipe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These should count as failures and eventually open the circuit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remove the toxic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; reset_peer_downstream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use this test:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifying connection pool recovery&lt;/li&gt;
&lt;li&gt;Testing retry logic&lt;/li&gt;
&lt;li&gt;Ensuring proper cleanup of broken connections&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 4: Bandwidth Limit (Network Congestion)
&lt;/h3&gt;

&lt;p&gt;Simulate network congestion with bandwidth restrictions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limit to 1KB/s:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic add redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; bandwidth &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restricts throughput to 1 kilobyte per second, simulating a severely congested network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small Redis operations (GET, SET) might still work but slowly&lt;/li&gt;
&lt;li&gt;Large operations (fetching big values, pipeline operations) will timeout&lt;/li&gt;
&lt;li&gt;Gradual degradation rather than immediate failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test different bandwidth levels:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Moderate congestion (10 KB/s)&lt;/span&gt;
toxiproxy-cli toxic update redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; bandwidth_downstream &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10

&lt;span class="c"&gt;# Severe congestion (100 bytes/s)&lt;/span&gt;
toxiproxy-cli toxic update redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; bandwidth_downstream &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Remove the toxic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; bandwidth_downstream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this tests:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application behavior under sustained degradation&lt;/li&gt;
&lt;li&gt;Whether your timeouts are appropriate for typical data sizes&lt;/li&gt;
&lt;li&gt;If your circuit breaker is sensitive enough to detect slow failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 5: Jitter (Variable Latency)
&lt;/h3&gt;

&lt;p&gt;Real networks don't have consistent latency—they fluctuate. Simulate this with jitter:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add latency with jitter:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic add redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;jitter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates latency ranging from 500ms to 1500ms (1000ms ± 500ms).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some requests complete quickly&lt;/li&gt;
&lt;li&gt;Others timeout randomly&lt;/li&gt;
&lt;li&gt;Circuit breaker sees intermittent failures&lt;/li&gt;
&lt;li&gt;Tests the circuit breaker's threshold logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More realistic than fixed latency&lt;/li&gt;
&lt;li&gt;Reveals issues with retry timing&lt;/li&gt;
&lt;li&gt;Tests how your system handles sporadic failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 6: Slow Close (Graceful Degradation)
&lt;/h3&gt;

&lt;p&gt;Simulate a service slowly dying:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add slice toxic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toxiproxy-cli toxic add redis-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; slice &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;average_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;size_variation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;32 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This slices data into small chunks with delays between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Behavior:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operations take progressively longer&lt;/li&gt;
&lt;li&gt;Eventually exceed timeouts&lt;/li&gt;
&lt;li&gt;Allows testing gradual degradation vs sudden failure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Testing Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Combining Multiple Toxics
&lt;/h3&gt;

&lt;p&gt;You can apply multiple toxics simultaneously to simulate complex scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add both latency and packet loss&lt;/span&gt;
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;200
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; slow_close &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simulates a network that's both slow AND unstable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Testing Script
&lt;/h3&gt;

&lt;p&gt;Create a bash script to run your test scenarios automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Starting Redis circuit breaker tests..."&lt;/span&gt;

&lt;span class="c"&gt;# Test 1: Complete failure&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Test 1: Simulating Redis down"&lt;/span&gt;
toxiproxy-cli toggle redis-proxy
&lt;span class="nb"&gt;sleep &lt;/span&gt;5
curl http://localhost:8080/health
toxiproxy-cli toggle redis-proxy

&lt;span class="c"&gt;# Test 2: High latency&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Test 2: Adding high latency"&lt;/span&gt;
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5000
&lt;span class="nb"&gt;sleep &lt;/span&gt;5
curl http://localhost:8080/health
toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; latency_downstream

&lt;span class="c"&gt;# Test 3: Reset connections&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Test 3: Reset connections"&lt;/span&gt;
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; reset_peer &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nb"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500
&lt;span class="nb"&gt;sleep &lt;/span&gt;5
curl http://localhost:8080/health
toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; reset_peer_downstream

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Tests complete!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Metrics to Track
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Circuit state&lt;/strong&gt;: closed, open, half-open&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure count&lt;/strong&gt;: Number of consecutive failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Success rate&lt;/strong&gt;: Percentage of successful requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency percentiles&lt;/strong&gt;: p50, p95, p99&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request volume&lt;/strong&gt;: Total requests during the test&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Test All Failure Modes
&lt;/h3&gt;

&lt;p&gt;Don't just test complete outages. Real production issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partial failures (some operations succeed, others fail)&lt;/li&gt;
&lt;li&gt;Slow failures (latency-induced timeouts)&lt;/li&gt;
&lt;li&gt;Intermittent failures (flaky networks)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Verify Recovery
&lt;/h3&gt;

&lt;p&gt;Always test that your circuit breaker recovers properly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cause failure&lt;/span&gt;
toxiproxy-cli toggle redis-proxy

&lt;span class="c"&gt;# Wait for circuit to open&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;10

&lt;span class="c"&gt;# Restore service&lt;/span&gt;
toxiproxy-cli toggle redis-proxy

&lt;span class="c"&gt;# Verify recovery&lt;/span&gt;
curl http://localhost:8080/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Test Under Load
&lt;/h3&gt;

&lt;p&gt;Run toxics while your service is under load to see realistic behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start load test&lt;/span&gt;
hey &lt;span class="nt"&gt;-z&lt;/span&gt; 60s &lt;span class="nt"&gt;-c&lt;/span&gt; 10 http://localhost:8080/api/endpoint &amp;amp;

&lt;span class="c"&gt;# Inject failure mid-test&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;20
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Clean Up Between Tests
&lt;/h3&gt;

&lt;p&gt;Always remove toxics and reset state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove all toxics&lt;/span&gt;
toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;--all&lt;/span&gt;

&lt;span class="c"&gt;# Or reset the entire proxy&lt;/span&gt;
toxiproxy-cli delete redis-proxy
toxiproxy-cli create redis-proxy &lt;span class="nt"&gt;-l&lt;/span&gt; localhost:6380 &lt;span class="nt"&gt;-u&lt;/span&gt; localhost:6379
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Example
&lt;/h2&gt;

&lt;p&gt;Here's a complete test scenario simulating a production incident:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Simulating Production Incident ==="&lt;/span&gt;

&lt;span class="c"&gt;# Phase 1: Normal operation&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Phase 1: Normal operation (30s)"&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;30

&lt;span class="c"&gt;# Phase 2: Redis starts slowing down&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Phase 2: Redis latency increases (1s → 3s)"&lt;/span&gt;
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; latency &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000
&lt;span class="nb"&gt;sleep &lt;/span&gt;10
toxiproxy-cli toxic update redis-proxy &lt;span class="nt"&gt;-n&lt;/span&gt; latency_downstream &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nv"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3000
&lt;span class="nb"&gt;sleep &lt;/span&gt;10

&lt;span class="c"&gt;# Phase 3: Redis becomes unstable&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Phase 3: Connections start resetting"&lt;/span&gt;
toxiproxy-cli toxic add redis-proxy &lt;span class="nt"&gt;-t&lt;/span&gt; reset_peer &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nb"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1000
&lt;span class="nb"&gt;sleep &lt;/span&gt;10

&lt;span class="c"&gt;# Phase 4: Complete outage&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Phase 4: Redis goes down completely"&lt;/span&gt;
toxiproxy-cli toggle redis-proxy
&lt;span class="nb"&gt;sleep &lt;/span&gt;20

&lt;span class="c"&gt;# Phase 5: Redis recovers&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Phase 5: Redis comes back online"&lt;/span&gt;
toxiproxy-cli toggle redis-proxy
toxiproxy-cli toxic remove redis-proxy &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;20

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Incident simulation complete ==="&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Check logs and metrics to verify circuit breaker behavior"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected outcome&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Phase 1-2: Elevated latency, some timeouts&lt;/li&gt;
&lt;li&gt;Phase 3: Circuit might start opening/closing intermittently&lt;/li&gt;
&lt;li&gt;Phase 4: Circuit should open and stay open&lt;/li&gt;
&lt;li&gt;Phase 5: Circuit should recover to closed state&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Testing with Toxiproxy transforms abstract resilience patterns into concrete, verifiable behaviors. By simulating real network failures, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt; that your circuit breaker opens at the correct threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; that your application handles failures gracefully&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discover&lt;/strong&gt; edge cases before they occur in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build confidence&lt;/strong&gt; in your system's resilience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is to test not just complete failures, but the spectrum of degradation that happens in production: latency spikes, intermittent errors, bandwidth constraints, and gradual deterioration.&lt;/p&gt;

&lt;p&gt;Remember: A circuit breaker that's never tested is just technical debt in disguise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Shopify/toxiproxy" rel="noopener noreferrer"&gt;Toxiproxy GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Netflix/Hystrix/wiki/How-it-Works#circuit-breaker" rel="noopener noreferrer"&gt;Netflix's Hystrix Circuit Breaker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/bliki/CircuitBreaker.html" rel="noopener noreferrer"&gt;Martin Fowler on Circuit Breakers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://redis.io/docs/manual/client-side-caching/" rel="noopener noreferrer"&gt;Redis Client Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Have you used Toxiproxy for testing? What failure scenarios have you discovered that surprised you? Share your experiences in the comments!&lt;/strong&gt; 💬&lt;/p&gt;

</description>
      <category>redis</category>
      <category>testing</category>
      <category>resilience</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
