Forem: Akarshan Gandotra

Part 10 — Lessons learned building a Kubernetes Auth Gateway

Akarshan Gandotra — Mon, 04 May 2026 18:42:44 +0000

We're at the end of the series. Nine chapters of mechanism. One chapter of opinion.

Building the Auth Gateway took roughly two years from "what if NGINX did the auth?" to "this thing handles every authenticated request in production." A lot of what's in the previous chapters wasn't obvious to us at the start. This is the post-mortem on our own architecture: what worked, what hurt, what we'd build earlier, and what we'd warn the next team about.

What worked

A few decisions held up cleanly. We'd make all of them again.

`auth_request` as the primitive

NGINX's auth_request directive is, with no exaggeration, the single most leveraged design choice in the platform. One directive, well-understood, supported across NGINX versions. We don't need a service mesh. We don't need a custom Envoy filter. We don't need a Lua module compiled into NGINX.

If you can do your auth in HTTP-status terms (200/401/403), auth_request is the right tool. If you can't, you probably want a sidecar or mesh-level enforcement and this whole architecture doesn't apply.

Endpoint metadata as data

Storing endpoint type and required permissions in Postgres, refreshed via Pub/Sub, was the right call. It means:

We can change auth without redeploying the gateway.
We can audit auth ("what protects this URL?") with a SQL query.
Admin tooling and the gateway share a single contract.

The cost — a small DB lookup at boot, an in-memory trie, a refresh mechanism — was tiny compared to the operational flexibility we got back.

One structured log line per decision

AUTH_DECISION is the contract between the Auth Service and oncall. Every field, every time, every request. A year of operations later, this is the artifact we reference most often. Every alert we've built points at it. Every incident postmortem references it.

Resist the temptation to add INFO/DEBUG lines around it. Resist the temptation to omit fields when they're "not relevant." One line. Same shape. Forever.

Fail-closed by default at the edge

error_page 502 503 504 = @auth_unavailable; was a one-line change that defines our security posture. When the Auth Service is unhealthy, NGINX returns 503 to the client instead of letting the request through. We've had a few incidents where this caused brief platform-wide outages. We have never regretted the choice.

The principle: the cost of a 5-minute outage on rare occasions is much, much less than the cost of one cross-tenant data leak ever.

Caches in the auth process, not in NGINX

auth_request is intentionally not cacheable, and we leaned into that. Every cache lives inside the Auth Service: JWT verify, RSA keys, route lookup, policy bitmap, revocation map, SA versions. Each is invalidated through its own channel. The gateway's hot path makes zero Redis calls in steady state.

This kept the architecture honest. The auth pod is the unit of correctness. Scale it, monitor it, debug it as one thing.

Pub/Sub-driven trie reload

Push-based invalidation for the endpoint trie was the right shape. Periodic-only would have given us a ~30 minute window where new admin routes were unprotected. Pub/Sub-only would have been brittle (events get lost). Both, with periodic as the safety net, gives us seconds of staleness in the common case and bounded staleness even when the message is lost.

Most caches we'd default to TTL. The trie was worth the special case.

The bitmap fast path

Encoding permissions as bit indexes paid off. Smaller tokens, faster checks, cleaner metrics. The legacy path we kept around for safety has earned its keep — version skew is real, and fall-through is graceful.

Shadow mode for two months before flipping the switch was the right rollout pattern. Catching three real bugs in shadow with zero impact on production is the gold standard for a sensitive change like authorization logic.

What hurt

Now the harder list. Things that cost us time, sleep, or trust.

Tenant resolution living in two places

NGINX resolves the tenant. The Auth Service also checks tenant binding (token tenant matches request tenant). The two places do different checks for a reason — but the reason isn't obvious, and we've watched several engineers add tenant logic to a third place because they didn't realize it was already covered.

What we'd do differently: write a single tenant-resolution doc that explicitly enumerates which layer owns what and what each layer assumes about the others. A "tenancy contract" page. We have it now (chapter 5 is a recovered version of it); we should have had it on day one.

The "first segment is the slug" rule

For a long time, the Auth Service split the URI on / and treated the first segment as the service slug. This worked until services started nesting each other or grouping under shared prefixes. We had to retrofit X-Service-Slug and X-Request-Path headers — backward-compatibly, with fallback to the old rule. The retrofit is fine; it took longer than it should have because the old rule was buried in three places.

What we'd do differently: explicit slug headers from day one. Don't infer slugs from the URI structure. The path inside a service is the service's business; the slug is a separate concern.

Migration to the bitmap took longer than expected

The bitmap fast path was a six-week project that took five months. The math was straightforward. What ate the time:

Coordinating bit-index assignments with the token issuer team (different repo, different rollout cadence).
Fixture data in our test suites was hardcoded with old permission strings; updating it for the bitmap registry was a long tail of small PRs.
The shadow comparison logic exposed three subtle bugs (Chapter 6) that each required investigation.

What we'd do differently: assume cross-team auth changes are 3x what you estimate. Build the shadow harness first, then the new path. The shadow harness paid for itself five times over.

Cache invalidation was an afterthought

The first version of the JWT cache was a map with time.AfterFunc evictors. We covered it in Chapter 8. It seemed fine. It fell over in production within a week.

The lesson generalizes: a cache without a written-down invalidation channel is a memory leak. Every cache should have:

A bounded size (entries or bytes).
A clear invalidation event ("token expired", "trie reloaded", "revocation event consumed").
A staleness window we can articulate ("up to 30 seconds late").

If you can't write those three down, don't add the cache.

The default tenant we shipped on day one

For the first quarter we had a default tenant. "If no X-Tenant-ID and no host match, fall through to default-tenant." It was added because it made local dev easier.

It cost us in two ways. First, removing it took longer than building it — every misconfigured client started 400'ing once we removed the fallback. Second, while it was live, it produced exactly one near-miss data leak (a service-account request without a tenant header writing into the wrong tenant). We caught it before it left staging.

Shipping that default tenant was the worst single decision in the whole project. We'd remove it from every future system before it ever boots.

Per-pod alert spam, twice

Twice we shipped alert code that fired per-request rather than per-state. The first time was during a Redis outage in our second month (lit up Slack with ~10k messages in 90 seconds). The second was during an RSA misconfig rollout (a few hundred messages per minute per pod, fleet-wide).

Both were the same bug: alerting from a request handler instead of from a state-transition observer. Both were "fixed" with atomic.Bool swaps. Now we apply that pattern aggressively.

What we'd do differently: write the alert dedup helper first, before the first alert. Have it baked into the codebase before there's anything to alert on.

What we'd build earlier

In the order we'd add them:

1. The structured `AUTH_DECISION` log

On day one. Even before fancy auth logic. The log structure outlives every other choice — every dashboard, every alert, every postmortem reads from it. Build the contract first.

2. Slack alert dedup helper

Before the first alert. Five lines of code to wrap an atomic.Bool around a Slack send. Ship it before you have anything to alert on.

3. Fail-closed posture

Before the gateway sees a single request in production. Don't even try permissive defaults. The "we'll tighten it later" path becomes "we shipped a permissive-by-default thing for 18 months." Just ship it tight.

4. Endpoint metadata in DB

Skip the YAML-of-routes phase. Skip the in-code decorator phase. Go straight to the database table with refresh mechanism. The transitional architectures cost more to migrate off than they cost to skip.

5. Gap probe on revocation streams

The probe (Chapter 7) catches data loss between the stream and consumers. It costs almost nothing to run. Without it you don't know if you're losing events; you just hope.

6. Shadow harness for sensitive changes

Comparing old-vs-new in production with the new path muted is a powerful pattern. Build the harness as a reusable thing. We re-implemented variants of it for three different rollouts before realizing it should be a library.

7. Tenancy contract document

One page. Owns: which layer resolves the tenant, which layer validates token-tenant binding, which layer scopes queries, what the failure modes are. Required reading before anyone touches request handling. Should have existed before the gateway shipped.

The maturity progression

Looking back, the gateway evolved through identifiable stages. They're worth naming because if you're starting fresh, knowing the destination shape lets you skip steps.

v1 — per-service auth libs. Where most teams are. Each service has its own JWT decode, its own permission check. Inconsistent, drift-prone, slow to fix CVEs. Don't stay here.

v2 — auth_request + minimal /auth. A simple gateway that decodes a token and returns 200/401. Static list of "open" routes. Enough to centralize the decision; not enough to scale.

v3 — trie + classification + Pub/Sub. Endpoint metadata in a DB. Trie in memory. Refresh kicks. Now adding a route doesn't require a redeploy.

v4 — revocation + caching. Logout works. Admin disable works. Each cache layer in place. Hot path is sub-millisecond.

v5 — bitmap + structured logs + degraded mode. The mature gateway. Fast, observable, alertable, recoverable.

Most of the value lives between v2 and v3. If you're at v1, that's the migration to plan for. v3 to v5 is iteration; v1 to v3 is the project.

Five pieces of advice for teams building this

If you're starting from scratch with a similar problem, here's what I'd hand off in five bullets:

1. Start with `auth_request`. Don't shop architectures.

Service mesh, custom Envoy filter, Lua plugin, sidecar — they all promise more flexibility. They all cost more in operations. auth_request is enough for the 90% case, and the 10% is rarely worth the complexity.

2. Make the gateway HA before anything else.

Two replicas minimum, HPA, graceful shutdown, retries to upstream auth pod, circuit-breaker semantics in NGINX, fail-closed posture. If any one of these is missing the gateway will take down your platform during a normal degraded event. This isn't optional.

3. The log is the API.

The AUTH_DECISION log is a public contract with everyone who ever debugs your gateway. Treat it like a schema. Don't change field names without a migration. Don't add free-form strings to enum fields. Have one version-controlled doc that defines every field and every value of every enum.

4. Cache invalidation has to be explicit.

Every cache: bounded size, explicit invalidation channel, articulable staleness window. If a cache doesn't have all three, it's a bug-in-waiting. We learned this twice.

5. Build observability before you build features.

Dashboards, alerts, trace context, the structured log — all of these come before you ship the cool feature you're excited about. A clever new permission model that you can't observe is worse than a boring permission model you can.

What we'd build next

A few things on our list that didn't fit this series:

NGINX otel module compiled in. Right now NGINX traces are limited; the Auth Service has full spans, but the NGINX hop is a black box from the trace's point of view. Worth fixing.
Per-tenant rate limiting. Currently we rely on upstream services. The gateway is the natural place.
WAF integration. We have an external WAF. Closer integration so WAF events show up in AUTH_DECISION would help triage.
Token introspection cache. Some integrations issue opaque tokens that we have to introspect with the issuer. Caching that lookup is its own caching problem; we haven't tackled it.
A formal "tenancy contract" page. Yes, the same one I told you to build on day one. We're catching up.

Each of these is a future series, probably.

Final architecture

For posterity, the picture of where we ended up:

Everything in this picture has been earned by an outage, a postmortem, or a near-miss. None of it is decoration. If you're building something similar and one of the boxes seems extra to you, it's because you haven't had the incident that justifies it yet.

Closing

Centralizing auth at the edge is one of those decisions that looks obviously correct in hindsight and is genuinely hard to convince a team to invest in beforehand. The wins are diffuse — slightly less drift, slightly fewer CVEs, slightly faster security responses. The pain is concentrated and visible — one new service to operate, one extra hop, one more place that has to be HA.

But every six months we look back and the gateway has paid for itself again. A library upgrade we did once instead of thirty times. A revocation feature that shipped in a week instead of being negotiated across teams. A multi-tenant isolation guarantee we can actually defend in audits.

If you take one thing from this series, take this: auth is not a problem you solve once and ignore. It's a problem you solve somewhere, well, and operate with care. Pick that somewhere to be the edge, build it small and observable, and the rest of your platform gets to focus on actual product work.

Thanks for reading. If you build one of these — or are stuck somewhere mid-build — drop a comment. The hardest part of operating an Auth Gateway is realizing that other people have built the same thing and hit the same rocks. There's no reason for each team to find them independently.

Part 9 — Operating the gateway: logs, traces, health, and degraded mode

Akarshan Gandotra — Mon, 04 May 2026 18:42:27 +0000

The first eight chapters of this series have been about building an Auth Gateway. This one is about living with one.

A gateway in front of every authenticated request is a force multiplier — for both your platform and any oncall page. If something is broken, it's broken everywhere at once. So observability isn't a Chapter 9 thing. It's a Chapter 0 thing. We just describe it last because there's enough mechanism to talk about that you need the rest of the series first.

This chapter covers the four things you need to be able to do at 3 AM:

Read a single log line and understand what happened.
Trace a slow request from edge to upstream.
Tell whether a pod is alive, ready, or in deep trouble.
Get an alert once — not once per pod per second — when something degrades.

The log line

There are exactly two structured log lines per protected request: one from NGINX, one from the Auth Service. They share request_id, so you can join them.

The NGINX line

Every request is logged as JSON to stdout via NGINX's log_format main:

log_format main escape=json '{
  "logType":"NGINX_LOGS",
  "request_id":"$request_id",
  "time_local":"$time_iso8601",
  "remote_addr":"$remote_addr",
  "request_method":"$request_method",
  "request_uri":"$request_uri",
  "request_path":"$uri",
  "slug":"$location_path",
  "product":"$product",
  "microservice":"$microservice",
  "status":"$status",
  "status_class":"$status_class",
  "request_time_ms":"$request_time_millis",
  "service_request_time_ms":"$upstream_response_time_millis",
  "service_connect_time":"$upstream_connect_time",
  "auth_request_time_ms":"$auth_time_millis",
  "auth_connect_time":"$auth_connect_time",
  "body_bytes_sent":"$body_bytes_sent",
  "http_referer":"$http_referer",
  "http_user_agent":"$http_user_agent",
  "http_x_forwarded_for":"$http_x_forwarded_for",
  "http_host":"$http_host",
  "tenant_id":"$tenant_id",
  "tenant_namespace":"$tenant_namespace",
  "identity_id":"$identity_id",
  "identity_type":"$identity_type",
  "auth_error_message":"$auth_error_message",
  "auth_error_code":"$auth_error_code"
}';

A few fields that matter more than they look:

auth_request_time_ms — how long the auth subrequest took. We graph this. We page on the p99 going above 50 ms.
service_request_time_ms — how long the upstream took, excluding the auth subrequest. Sequential, not overlapping (Chapter 2).
status_class — 1xx/2xx/3xx/4xx/5xx. Faster than parsing status for dashboard breakdowns.
tenant_id — the resolved tenant. Always grep by tenant first.
auth_error_code + auth_error_message — populated on deny. Empty on allow.

Two extra log formats for the fail paths

When NGINX hits @auth_unavailable or @upstream_unavailable, we log to a different format:

log_format auth_unavailable     '...auth-specific fields...';
log_format upstream_unavailable '...upstream-specific fields...';

location @auth_unavailable {
  internal;
  access_log /dev/stdout auth_unavailable;
  return 503 ...;
}

The reason: when something is on fire, you want it isolated in its own log stream. Dashboards built off main get drowned by 200s; a dashboard against auth_unavailable shows you exactly the broken bucket without filtering.

The Auth Service line

The Auth Service emits exactly one AUTH_DECISION log per request:

{
  "level":"info",
  "ts":"2026-05-01T12:34:56.789Z",
  "logger":"AUTH_DECISION",
  "request_id":"550e8400-e29b-41d4-a716-446655440000",
  "trace_id":"4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id":"00f067aa0ba902b7",
  "uri":"/api/v1/users",
  "method":"GET",
  "slug":"user-service",
  "tenant_id":"mt_prod",
  "endpoint_type":"ACCESS_CONTROLLED",
  "identity_type":"USER",
  "identity_id":"user@example.com",
  "auth_method":"bearer_token",
  "decision_reason":"ACCESS_LEVEL_MATCH",
  "outcome":"allow",
  "status":200,
  "duration_ms":2.5,
  "authn_ms":1.8,
  "authz_ms":0.7,
  "jwt_cache_hit":true,
  "bitmap_authz_used":true,
  "granted_access_levels":["product:admin"],
  "token_revoked":false,
  "slow":false
}

This is the single most important artifact in the whole gateway. It is, in dry terms, our auth audit log. In practical terms it's the thing oncall greps when anything goes weird.

Design rules we apply to it religiously:

Exactly one line per request. No "starting auth", "authenticated", "authorizing", "decided" — those make the storyline split across N entries that you have to stitch back together. One line, one decision, every field.
Every field, every time. If a field doesn't apply (e.g., bitmap_authz_used for an OPEN endpoint), it's false or empty, not omitted. Optional fields make ad-hoc queries painful.
decision_reason is enum-only. Free-form strings here would be the death of dashboards. New reasons require code review (Chapter 3).
Trace IDs are present. trace_id and span_id are pulled from the OpenTelemetry context, so the log line stitches to traces in our backend without join logic.

Joining NGINX and Auth Service

request_id is generated by NGINX ($request_id) and forwarded to the Auth Service via the subrequest header X-Request-ID. Both log lines carry it. A typical investigation:

1. user reports 401 at 12:34:56 UTC
2. grep tenant_id="mt_prod" identity_id="..." in NGINX logs around the time
3. capture request_id
4. grep that request_id in Auth Service logs
5. read decision_reason — full story

The whole graph of "client → ingress → NGINX → Auth Service → upstream" stitches back together by request_id.

Tracing

OpenTelemetry instrumentation runs across both NGINX (via the otel module if compiled in; we're tracking that as a future improvement) and the Auth Service. The Auth Service span looks like:

SPAN: pth-auth-service POST /auth
├── ATTRIBUTES
│   ├── http.route = "/auth"
│   ├── auth.tenant_id = "mt_prod"
│   ├── auth.endpoint_type = "ACCESS_CONTROLLED"
│   ├── auth.outcome = "allow"
│   └── auth.decision_reason = "ACCESS_LEVEL_MATCH"
├── EVENTS
│   ├── jwt.cache.hit
│   ├── route.cache.hit
│   └── bitmap.match
└── DURATION 2.5ms

The trace context propagates to upstream services via standard W3C trace headers. So a single trace shows: client → ingress → NGINX (eventually, via otel module) → Auth Service span → upstream service span(s) → DB calls inside the upstream. The whole story.

We don't turn on full sampling. 1% sampling at edges, 100% sampling for spans tagged auth.outcome="deny". The deny path is where the interesting investigations happen; sampling it fully gives us forensic detail without exploding storage.

Health probes

Three K8s probe endpoints, each with a different purpose.

`/livez` — process is alive

func Liveness(c *gin.Context) {
    c.JSON(http.StatusOK, gin.H{"status": "ok"})
}

Returns 200, always. The contract: as long as this handler runs, the process isn't deadlocked. K8s only kills the pod if the request times out (handler doesn't run at all).

What /livez does not check:

It does not check the trie.
It does not check Redis.
It does not check Postgres.

This is intentional. A pod whose Redis connection died can still serve cache-hot requests correctly. Killing it on a Redis outage is exactly the wrong thing to do — you turn a cache-hit-100%-but-Redis-down state into a cluster-wide rolling restart.

`/readyz` — pod can take traffic

func Readiness(c *gin.Context) {
    if !trieLoaded() {
        c.JSON(503, gin.H{"status": "trie not loaded"})
        return
    }
    if revocationExpected() && !revocationServiceReady() {
        c.JSON(503, gin.H{"status": "revocation cache not ready"})
        return
    }
    c.JSON(200, gin.H{"status": "ok"})
}

Two gates:

Trie loaded. Without it, we can't classify any endpoint. The pod is useless until the trie is in memory.
Revocation cache ready (if revocation is enabled). Without it, fail-closed designs would deny everything; fail-open designs would miss every revocation. Either way, not ready.

Notably not gated on Redis health: a pod that lost Redis after readiness flips green stays ready. Refreshes fail loudly via Slack, but live traffic isn't disrupted.

`/healthz` — deep health

func Health(c *gin.Context) {
    pgErr := pingPostgres(c.Request.Context())
    rdErr := pingRedis(c.Request.Context())
    if pgErr != nil || rdErr != nil {
        c.JSON(503, gin.H{
            "status": "degraded",
            "postgres": errString(pgErr),
            "redis":    errString(rdErr),
        })
        return
    }
    c.JSON(200, gin.H{"status": "ok"})
}

This one does depend on Redis and Postgres. It's not used by Kubernetes — it's used by external monitoring (Pingdom, status pages). The distinction matters:

K8s probes determine traffic routing. They should be tolerant — every false-positive failure pulls a pod out of service.
Monitoring probes determine alerting. They should be strict — they tell humans something is wrong, not the load balancer.

A common mistake is wiring /healthz to readinessProbe. Don't. You will pull pods out of the rotation on a transient Redis blip and convert a degraded state into an outage.

stateDiagram-v2
    [*] --> Booting
    Booting --> NotReady: trie empty
    NotReady --> Ready: trie loaded AND<br/>(revocation disabled OR revocation cache warm)
    Ready --> Degraded: Redis stream XREAD failure
    Degraded --> Ready: reconnect
    Ready --> Live: /livez always 200
    Ready --> DeepCheck: /healthz pings PG + Redis

Degraded mode

The gateway is built to tolerate three specific kinds of trouble:

Redis is slow or down

Revocation stream consumer fails. streamDegraded atomic flips true. Slack alert fires once. Hot path keeps working — local cache is in memory.
Pub/Sub subscriber reconnects in the background. On reconnect, the subscriber re-subscribes. The periodic cleanup goroutine resyncs the ZSET when it next runs.
SA version sync fails. Local SA version map is unchanged. Tokens continue to validate against the last known versions until the next successful sync. Slack alert fires.

Postgres is slow or down

Trie reload fails. Existing trie remains in memory. Slack alert. Hot path is unaffected.
New pods cannot start (initial trie load blocks readiness). Existing pods serve.
This is a partial outage: scaling up is broken, current capacity still works.

Slack is slow or down

Alerts are fire-and-forget goroutines. We don't block the hot path on Slack.
If Slack is down, alerts are queued in goroutines for a configured timeout (5s) then dropped. We don't retry forever and OOM the pod.

The alert tree

flowchart TD
    Boot[startup] --> A1{revocation Redis OK?}
    A1 -->|no| Alert1[Slack: TokenRevocationService Redis client not initialized<br/>readyz=503]
    Run[runtime] --> A2{XREAD error?}
    A2 -->|yes| Alert2[Slack: stream degraded once]
    A2 -->|recover| Alert3[Slack: stream recovered]
    Run --> A3{localCache size > MAX?}
    A3 -->|yes| Alert4[Slack: cache overflow once<br/>fall back to ZSCORE]
    Run --> A4{RSA key missing for tenant?}
    A4 -->|yes| Alert5[Slack: per-tenant dedup]

The Slack-alerter pattern

Three rules we apply rigorously to alert code, because we learned them the hard way:

Rule 1: Atomic-bool dedup per state transition

type degradeFlag struct {
    on atomic.Bool
}

func (d *degradeFlag) MarkDegraded(reason string) {
    if !d.on.Swap(true) {
        slack.Alert("auth.degraded", reason)
    }
}

func (d *degradeFlag) MarkRecovered() {
    if d.on.Swap(false) {
        slack.Alert("auth.recovered", "")
    }
}

The Swap returns the previous value. If it was already true, we don't re-alert. We alert once on the transition, and once on the recovery.

Without this, a Redis outage produces thousands of Slack messages per pod per minute. The first time it happened, oncall threw their phone across the room.

Rule 2: Per-(pod × cause) dedup, not per-cause

A 100-pod deployment hitting the same RSA key misconfiguration alerts 100 times. That's correct: each pod is a separate runtime, each could have its own state, each is a separate alert source. We tag every alert with the pod's hostname so you can see in Slack whether it's a single-pod or fleet-wide problem.

Rule 3: Deployment-tag prefix

Every alert is prefixed with [customer-env]:

[acme-prod] auth.degraded TokenRevocationService Redis stream degraded

This makes a single Slack channel viable for many environments. Without the prefix, you can't tell at a glance whether the alert is from staging or prod.

What we don't alert on

Individual auth failures. Login throttling, expired tokens, denied requests — those are normal and high-volume. They're in dashboards, not Slack.
High latency on a single request. Latency alerts go on rolling p99, not on individual outliers.
Anything below "the gateway behaves correctly but degraded." If the system self-heals quietly, we don't page humans.

NGINX-specific operations

A few NGINX-isms worth calling out.

Graceful shutdown

The chart's deployment does this on preStop:

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - |
          echo "[preStop] draining 15 seconds..."
          sleep 15
          echo "[preStop] nginx -s quit..."
          nginx -s quit
          while pgrep -x nginx > /dev/null; do sleep 1; done

15 seconds of drain, then nginx -s quit (graceful — drain in-flight requests, then exit), then wait for all worker processes to finish. Combined with terminationGracePeriodSeconds: 60, we have ~60 seconds total budget for clean shutdown. Without this, rolling deploys produced visible spikes of 502s in client logs.

Worker tuning

worker_processes  auto;
worker_rlimit_nofile 65535;
worker_shutdown_timeout 30s;

events {
  worker_connections 10240;
  use epoll;
  multi_accept on;
  accept_mutex off;
}

worker_processes auto scales to CPU count. accept_mutex off is the modern default — let kernel epoll handle accept distribution. worker_connections 10240 is per worker, so a 4-core pod handles ~40k concurrent connections.

Upstream keepalive

upstream auth_service {
  server auth-service-golang.<ns>.svc.cluster.local:80
         max_fails=3 fail_timeout=30s;
  keepalive 64;
  keepalive_requests 1000;
  keepalive_timeout 60s;
}

keepalive 64 is per worker. A 4-worker NGINX with 64 keepalive maintains 256 keepalive connections to the Auth Service. Without keepalive, every subrequest opens a fresh TCP connection — fatal at any real RPS. The first time we deployed without it, p99 auth time was 80 ms. With it: 3 ms.

max_fails=3 fail_timeout=30s marks the upstream "down" after 3 consecutive failures and stops sending traffic for 30 seconds. Combined with the retry config in auth.conf, this gives smooth failover when one auth pod is sick.

A picture of where the time goes

gantt
    title Single request as seen by NGINX
    dateFormat  X
    axisFormat %s ms
    section NGINX
    receive + route match : 0, 1
    auth subrequest       : 1, 5
    proxy upstream        : 6, 18
    log JSON              : 24, 1

The auth subrequest is the smallest thing on the timeline. That's not by accident. Chapter 8's caches are the reason it stays small. Chapter 9 is what tells you when it stops staying small.

What's next

Chapter 10 is the retrospective: what we'd build differently if we did it from scratch, what tools we wish we'd added on day one, and the maturity progression from "auth library in every service" to "production-grade gateway." It's the chapter you write after operating the thing for two years, and it's the one I'd want to read if I were starting from scratch.

Part 8 — Making It Fast: Caching, Hot Paths, and Avoiding DB Calls

Akarshan Gandotra — Mon, 04 May 2026 18:42:14 +0000

The Auth Gateway sits in front of every authenticated request in the platform. Its latency isn't just its own latency — it's the floor for every service behind it. If auth takes 50ms, every request to every upstream service starts 50ms in the hole.

Our internal target is sub-millisecond on cache-hot paths. The way we hit it isn't clever algorithms — it's a stack of small caches, each one handling a different kind of state, each invalidated through a different channel. This post walks through all of them.

The principle that shapes everything

Before the individual layers: a rule we hold as policy.

Redis is allowed to influence the hot path. Redis is not allowed to block it.

Every cache in the system is in-process. Redis feeds them asynchronously — pushing revocation events, triggering trie reloads, syncing SA versions. But a pod whose Redis connection is dead can still answer requests correctly, for the duration of its staleness window.

That's the difference between "Redis is down, the platform is down" and "Redis is down, the platform is slightly stale." One is a severity-1 incident. The other is a degraded mode we can tolerate for minutes while someone fixes it.

With that framing, here's how a warm request flows through the cache stack:

Six layers. Five are pure in-process memory. The sixth — revocation — is in-process too, but fed asynchronously from Redis. No layer blocks on a network call.

Layer 1: JWT verify cache

The single biggest win in the stack. RSA signature verification is expensive — a few hundred microseconds per call — and at 50,000 RPS that cost is real.

We wrap the entire decode-and-verify path in a Ristretto cache. The key is a 64-bit FNV hash of the raw token string; the value is the decoded JWT claims. On a cache hit, we skip RSA verification entirely.

A few choices worth explaining:

Why Ristretto over a plain LRU. Ristretto uses TinyLFU — it tracks access frequency and uses it to decide what to evict. Under burst traffic, a pure LRU can evict frequently-used tokens just because they weren't the most recent. TinyLFU keeps the hot tokens and evicts the cold ones. The behavior under load is meaningfully better.

Why hash the token string. Two reasons. Memory: a JWT is 500–2000 bytes; a uint64 is 8. And defense-in-depth: if the cache state ever ends up in a log or heap dump, the tokens themselves aren't exposed.

Why cap TTL at 30 seconds. The cache stores the decoded token, not the auth decision. Revocation is checked separately on every request. But capping TTL at 30 seconds keeps the staleness window honest — a token that's been revoked won't ride a warm cache entry for an hour.

Layer 2: RSA public key cache

Per-tenant RSA public keys are loaded from environment config at boot. Parsing PEM is not free — a few hundred microseconds — and we don't want to pay it on every cache miss.

We cache the parsed key per tenant using sync.Once. The first request for a given tenant parses the key; every request after that gets the cached result, including if the first parse failed.

Two operational details that matter:

A misconfiguration fires a Slack alert once per tenant per pod, not once per request. Without this guard, a single bad key config generates a Slack message for every request that hits that tenant, which during a deploy is thousands of messages in seconds.

Key rotation requires a pod restart. We considered hot-reloading. We chose deploy-to-rotate — the operational simplicity of a predictable restart beats the complexity of a file watcher and the failure modes it introduces.

Layer 3: route cache

The trie lookup is already fast — O(depth), with depth typically 3–5 segments. But re-walking the same paths 50,000 times a second is wasteful. A TinyLFU cache sits in front of the trie, keyed by slug, HTTP method, and path.

The platform has around 3,000 distinct route tuples in production. Sized at 10,000 entries, the cache fits the entire steady-state working set with room to spare. Misses are new endpoints, cold starts, and post-reload warm-up.

Invalidation is bulk. On any trie reload — whether triggered by a periodic interval or a Redis Pub/Sub kick — we drop the entire route cache. We considered partial invalidation (only drop entries for changed slugs) and rejected it. Trie reloads are rare. The cache refills in milliseconds. The bookkeeping complexity of partial invalidation isn't worth the seconds of warm-up time it would save.

Layer 4: the trie

The trie is a cache too, just an unusual one. It's an in-memory mirror of the endpoint table from Postgres. No request ever touches Postgres on the hot path.

Invalidation has two channels:

Periodic: every hour by default. A safety net.
Push: via Redis Pub/Sub on auth:trie:refresh. Admin tooling publishes this after any write to the endpoint table. Pods reload within milliseconds.

The push channel exists because endpoint changes are operationally significant. A new admin route that's meant to be protected shouldn't have a one-hour window where it's open because the trie hasn't refreshed. The push channel closes that window.

Layer 5: policy bitmap snapshot

The permission bitmap (covered in the previous chapter) is loaded alongside the trie. It's an in-memory structure mapping permission names to bit indexes, with a version number.

The snapshot is never partially updated. It's swapped atomically — a background process builds a new snapshot when the registry changes, then stores it via an atomic pointer swap. Readers grab the pointer at the start of a request and work with that exact snapshot throughout. No locks, no torn reads.

This pattern shows up repeatedly in the codebase: when state changes as a whole unit, an atomic pointer is simpler and faster than a read-write mutex around a map. It's worth internalizing.

Layer 6: revocation map and SA version map

These were covered in depth in the previous chapter. In the context of the cache stack:

The revocation map is bounded at 50,000 JTIs, fed by a Redis Stream, and fails open — if a JTI isn't in the map, we treat it as not revoked. The staleness window is low single-digit seconds in steady state.

The SA version map has the opposite posture: fail closed. If the map isn't ready, the pod doesn't pass readiness. If a service account token's version is behind the current version in the map, it's denied.

Same underlying shape — in-memory map fed asynchronously from Redis — but different risk tolerance based on what's being protected.

How all the invalidation channels fit together

Three patterns across the stack:

TTL-based (JWT verify cache). Simple, no coordination. Best when the cached value has a natural expiry built into it — which JWTs do.

Push-based (trie, revocation stream, SA version). Required when a staleness window has real cost. Needs a degraded-mode plan for when the push channel is unavailable.

Capacity-based eviction (route cache, JWT cache). Bounded memory by design. What gets evicted matters more than when — which is why TinyLFU beats LRU for this workload.

When in doubt, start with TTL. Push-based caches are powerful but bring failure modes — lost events, stalled consumers, cursor races. Use them only when a TTL window is genuinely unacceptable.

The cache we got wrong

Our first JWT cache used a plain Go map with a mutex and a time.AfterFunc per entry to handle expiry.

It worked in tests. It fell over in production within a week. Two problems:

Goroutine pressure. Every cached token spawned a timer goroutine. At a million live tokens, the Go scheduler handled it — but GC pauses got ugly and unpredictable.

No cap. There was no size limit. Memory grew until pods OOM-killed.

Switching to Ristretto solved both: timers are amortized into a small internal worker, and MaxCost enforces a hard ceiling.

The lesson: a cache is a copy of state. If there's no mechanism to bound or invalidate it — TTL, push, or capacity — it's not a cache. It's a memory leak.

Cold start vs. warm

A pod's first requests are slower. The trie loads from Postgres before readiness flips — that's the only DB call in the pod's lifetime. After that, every lookup is in-memory.

The JWT cache starts empty on a fresh deploy and fills up within seconds as real tokens come through. We don't pre-warm it — the cost of cold RSA verifications for a few seconds after a deploy is acceptable.

The revocation cache we do pre-warm, synchronously, before readiness. A pod that's marked ready must have the current revocation set. Otherwise it would fail-open on every request until its first Redis sync — meaning any logouts from the past hour would be invisible to it.

What to actually graph

For each cache, the metrics that matter:

Hit rate — the most important number. A cache with a stable size but falling hit rate is broken.
Eviction rate — meaningful only if the cache is bounded. High eviction with high hit rate is fine; it means the cache is doing its job under pressure.
Size — useful for capacity planning, not for alerting.

The JWT verify cache runs at 95%+ hit rate in steady state. A fresh deploy drops it to zero and it climbs back within seconds. Anything else warrants investigation.

Don't alert on cache size. Alert on hit rate.

Next up: Chapter 9 — operating the gateway. The structured auth decision log, OpenTelemetry tracing, the three Kubernetes probes, degraded-mode behavior, and the Slack alert pattern that keeps on-call sane during a Redis outage.

Part 7 — Token Revocation Without Killing Performance

Akarshan Gandotra — Mon, 04 May 2026 18:41:57 +0000

JWTs have a hard problem hiding inside them: they're stateless. The whole point of a JWT is that the verifier can check a signature and make a decision — no database, no round-trip. That's what makes them fast. It's also what makes "log this user out right now" not work out of the box.

We had to solve this. Users log out. Admins disable accounts. Service accounts rotate. Each one of those events has to invalidate live tokens immediately, not at the next expiry tick.

This post is about how we did it without giving up the performance properties that made JWTs worth using in the first place.

The constraints that ruled out the obvious answers

Three numbers shape the design:

50,000 RPS of authenticated requests.
Sub-millisecond auth budget on the hot path.
Single-digit-second propagation — when a user logs out, every pod must know within a few seconds.

The obvious approaches each break one of these:

Query Redis on every request. Adds a network round-trip to every auth decision. Median latency explodes. Redis also becomes a hard single point of failure — if it's slow or down, every request fails.

Push revocation events via websockets or long-poll to every pod. Works at low scale. Gets fragile when pods churn, restart, or drop events during a network blip.

Short-lived tokens with fast refresh. A 5-minute expiry reduces the window, but doesn't close it — and 5 minutes is too long when an account is disabled for a security reason.

What worked: a two-layer design.

Redis is the propagation layer. It holds the authoritative revocation state and a live event feed.
Local memory is the decision layer. Each pod keeps an in-memory map of revoked JTIs. The hot-path check is a single map lookup — no I/O.

Two Redis structures, one job each

Two Redis keys do the heavy lifting, and they serve different purposes — which is why both are necessary.

revoked_access_tokens is a sorted set. Each member is a JTI; the score is the token's expiry timestamp. This is the source of truth at any point in time — you can ask it "give me everything currently revoked" with a single range query.

revoked_access_token_events is a stream. Each entry carries the JTI, expiry, and metadata about the revocation. This is the live feed — pods subscribe to it and learn about new revocations as they happen.

The ZSET answers "what is the state right now?" The Stream answers "what has changed since I last checked?" You need both because they're good at different things: the ZSET is for bulk reads at startup, the Stream is for incremental updates during steady state.

The startup problem — and two races hiding in the obvious solution

When a pod boots, it needs to populate its local map before it serves traffic. The tempting approach: read the ZSET to get current revocations, then subscribe to the Stream for updates.

Two races hide here.

Race 1: What if a revocation arrives between the ZSET read and the Stream subscription? The event is in the Stream, but the pod's cursor is positioned after it. The JTI never makes it into the local cache.

Race 2: What if you start the Stream consumer from the very beginning (0-0) to avoid missing anything? Now you replay every event ever emitted — potentially thousands. Worse: if the stream has been trimmed, you'll silently miss events older than the trim window.

The fix is to reverse the order: capture the Stream tip before reading the ZSET, then start the consumer from that captured tip.

func (s *TokenRevocationService) WarmCache(ctx context.Context) error {
    // 1. Capture the stream tip first.
    tipID, err := s.captureStreamTip(ctx)
    if err != nil { return err }

    // 2. Read the current ZSET snapshot.
    now := time.Now().Unix()
    members, err := s.redis.ZRangeByScoreWithScores(ctx,
        "revoked_access_tokens",
        &redis.ZRangeBy{Min: strconv.FormatInt(now, 10), Max: "+inf"},
    ).Result()
    if err != nil { return err }

    s.mu.Lock()
    for _, m := range members {
        s.localCache[m.Member.(string)] = int64(m.Score)
    }
    s.mu.Unlock()

    // 3. Start the consumer at the tip captured before the ZSET read.
    // Anything that arrived between tipID and now replays through the consumer.
    go s.consumeStream(tipID)
    return nil
}

The order matters precisely. By capturing the tip first, anything that arrives while we're reading the ZSET will replay through the consumer. Anything already in the ZSET when we read it is loaded directly. If the same JTI appears in both — a revocation that landed right on the boundary — setting the same map entry twice is harmless.

The pod's lifecycle from boot to steady state:

The hot path: deliberately boring

The actual check on every request is about as simple as it gets:

func (s *TokenRevocationService) IsJTIRevoked(jti string) bool {
    s.mu.RLock()
    expiresAt, found := s.localCache[jti]
    s.mu.RUnlock()
    if !found { return false }
    if time.Now().Unix() > expiresAt { return false }
    return true
}

A read lock, a map lookup, a comparison. No Redis, no network. Hundreds of nanoseconds.

The !found → false branch is a deliberate fail-open choice: if a JTI isn't in the local cache, we treat it as not revoked. The risk is that a freshly revoked token might be accepted for the few seconds between the revocation being published and the local cache being updated. We accept that window. The alternative — failing closed — would mean denying every request whose JTI we haven't explicitly loaded, which at startup means denying all traffic until the cache is fully warm. That's worse.

The gap probe: catching what the Stream misses

The Stream consumer keeps a cursor — the ID of the last event it processed. Periodically, the stream gets trimmed to bound its size. If the consumer's cursor falls behind the trim window (because of a slow handler, a GC pause, or a network blip), the next XREAD will silently skip the trimmed events.

We detect this with a gap probe that runs every 5 minutes:

If the oldest event currently in the Stream is newer than the consumer's cursor, we missed something. When that happens, we resync from the ZSET (which is the authoritative source of truth and doesn't get trimmed the same way) and snap the cursor to the stream tip.

This probe has fired exactly twice in production since we added it — both times during planned Redis maintenance — and both times the recovery was automatic. The value isn't that it fires often. It's that without it, you'd never know you missed events at all.

Service accounts: the same idea, different risk tolerance

User token revocation fails open — a freshly revoked token might slip through for a few seconds. That's acceptable: the window is small, bounded, and observable.

Service-account rotation fails closed. When a service account is rotated, the old credentials must be denied immediately, even if that means a slightly degraded startup path.

The mechanism is different too: instead of JTI revocation, service accounts carry a version number. The gateway keeps a local map of current SA versions loaded from Redis. If the token's version is less than the current version for that service account, it's denied.

The pod won't pass readiness until this cache is loaded. If Redis is unavailable at startup, the pod doesn't serve traffic. That's intentional — we'd rather have fewer pods than pods that can't correctly enforce SA rotation.

The 60-second sync window is our exposure. We reduce the effective risk by having the rotating system hold the old version live for a grace period, only promoting the new version once enough gateways have synced.

What revocation doesn't do

A few things that seem natural but aren't in scope:

Revoke by user ID. The cache is JTI-indexed. To revoke all of a user's tokens, the issuer enumerates their live JTIs and revokes each one. The Auth Service sees only individual JTIs.

Cross-region propagation. We run regional auth services with regional Redis instances. Revocations published in one region don't automatically appear in another. Most revocations are tenant-bound, and tenants are region-bound, so this rarely matters in practice.

Shared Redis. This Redis instance is auth-only. The corner cases in revocation are complex enough that sharing infrastructure with rate limiters or session stores would make debugging much harder.

What we'd do differently on day one

Add the gap probe immediately. It's a small amount of code and it's the difference between "we silently lose a logout event occasionally" and "we always know when propagation breaks."

Test the warm path with a slow or unavailable Redis. Most bugs we found were in error handling during startup, not steady-state operation. The warm path runs once per pod lifetime; staging rarely exercises it unless you deliberately inject failures.

Bound everything from the start. The local cache, the stream length, the sync interval. Unbounded growth in any of them becomes an incident.

Next up: Chapter 8 — every cache in the hot path, together. JWT verify cache, RSA key cache, route cache, policy bitmap, revocation map, SA version map. Each one is fast individually; together they're how the gateway fits inside its latency budget. We'll cover TTL strategy, invalidation, and the one cache where we got eviction wrong.

Part 6 — Authorization at Scale: Access Levels, Roles, and Compact Decisions

Akarshan Gandotra — Mon, 04 May 2026 18:41:42 +0000

Authentication answers "who are you?" Authorization answers the harder question: "are you allowed to do this?"

By the time a request reaches this stage, we've already validated the token and confirmed the tenant. Now we need to decide — before the request touches any upstream service — whether this specific identity has permission to call this specific endpoint. That decision runs hundreds of millions of times a day. It needs to be fast, correct, and cheap to reason about when something goes wrong.

This post is about the model we use, the simpler approach that served us for a year, and the optimization we eventually built — and why we kept the old path around anyway.

The model: three layers, one question

Our authorization model has three layers:

A role is what a user is granted — something like clinic_admin or billing_specialist. An access level is a coarse permission — something like user:admin or schedule:write. An endpoint declares which access levels are sufficient to reach it.

Roles are bags of access levels. Endpoints are protected by lists of access levels. A user can call an endpoint if any of their roles' access levels appear in the endpoint's required list. That's the whole model.

We deliberately keep it coarse. There are dozens of access levels in the system, not thousands. Questions like "can this user delete this specific patient record?" belong to the upstream service that owns that data — it has the context the gateway doesn't. The gateway's job is to filter on the order of "is this user even an admin at all?" — a check that catches the vast majority of misuse and runs at edge speeds.

Who defines access levels and endpoints?

Here's something that might seem surprising: the gateway doesn't own the access level definitions. Individual product services do.

Each service ships a access_levels.json alongside its code. This file declares what access levels it recognizes and which endpoints require which levels. A scheduling service owns schedule:write. A billing service owns billing:read. The gateway is a consumer — it doesn't make editorial decisions about what permissions mean.

// access_levels.json — owned and maintained by the upstream service
{
  "access_levels": [
    { "name": "schedule:write", "description": "Create and modify appointments" },
    { "name": "schedule:read",  "description": "View appointments" }
  ],
  "endpoints": [
    { "path": "/api/appointments",  "method": "POST", "requires": ["schedule:write"] },
    { "path": "/api/appointments/*", "method": "GET",  "requires": ["schedule:read", "schedule:write"] }
  ]
}

The publish flow runs through CI/CD. When a service merges a change to its access level definitions, a pipeline step pushes the updated file to a well-known S3 path. The gateway picks up the change on its next refresh cycle — no gateway deploy required, no manual registry edits.

service-repo/
  access_levels.json   ← owned by the service team
  .github/workflows/publish.yml

# publish.yml (simplified)
- name: Publish access levels
  run: |
    aws s3 cp access_levels.json \
      s3://registry/services/${{ env.SERVICE_NAME }}/access_levels.json

This keeps ownership aligned: the team that builds the feature decides what permission protects it. The gateway team owns the mechanism; product teams own the policy. Changes are auditable through git history, reviewable via pull request, and rollbackable the same way code is.

What the token carries — and what it doesn't

The JWT includes the user's granted access levels as a bitmap — a compact byte slice — along with the version of the registry used when the token was issued. It does not contain the full permission graph, and it does not contain endpoint requirements. Those live in the database, loaded into memory at boot.

A decoded JWT payload looks like this:

{
  "sub": "user_abc123",
  "tenant": "acme-health",
  "policy_bitmap": "__________________8H",
  "policy_bitmap_version": 114,
  "exp": 1714000000
}

policy_bitmap is a base64url-encoded byte slice — each bit position corresponds to one access level in the registry at version 114. policy_bitmap_version tells the gateway exactly which registry snapshot to use when interpreting the bits. If the gateway's current registry is at version 114, it uses the fast bitmap path. If the versions differ, it falls back to string matching (more on that below).

This is a deliberate tradeoff. The stateless alternative — put everything in the token, make every decision without a database — sounds clean until users accumulate permissions. Tokens balloon to 4–8 KB. Cookies start failing at network edges. Mobile clients cache tokens aggressively and get stuck with stale permission sets. Every role change requires re-issuing every affected token immediately.

The compromise: the JWT carries coarse access levels (a small, stable set encoded as a bitmap), and the database carries endpoint requirements (queried once at startup, refreshed on demand). Per-request authorization is a fast in-memory lookup on both sides.

The payoff on token size is significant. Before the bitmap rework, heavily-permissioned admins had tokens approaching 3 KB. After:

The original approach: string matching

The first implementation is what you'd sketch on a whiteboard. Take the user's access levels, take the endpoint's required access levels, check if they overlap.

It's O(n + m) — linear in the number of user permissions and required permissions. With typical values (a user might have 20–80 access levels, an endpoint usually requires 1–3), this runs in nanoseconds. It's correct, it's readable, and it worked fine in production for over a year.

The reason we eventually replaced it had nothing to do with speed.

The first reason was token size. As the platform grew and senior users accumulated more access levels, tokens stretched. We had admins with tokens approaching 3 KB. That's uncomfortable but manageable — until it isn't.

The second reason was density of signal. String matching tells you that the user was authorized, but the log entry just says granted: ["user:admin"]. We wanted richer per-permission metrics — which access levels are actually being exercised, which ones are granted but never hit anything — without adding another pass over the data.

The bitmap approach: compress the representation, keep the logic

The idea is simple: assign every access level a stable integer index. Represent a user's granted permissions as a bit vector — one bit per access level. Represent each endpoint's requirements the same way. Authorization becomes a bitwise AND.

If the result is nonzero, the user has at least one of the required permissions. Allow. If the result is zero, deny. That's the entire hot path.

The anchoring snippet — the intersection check at the core of it:

func intersects(a, b []byte) bool {
    n := len(a)
    if len(b) < n {
        n = len(b)
    }
    for i := 0; i < n; i++ {
        if a[i]&b[i] != 0 {
            return true
        }
    }
    return false
}

For a typical 32-byte bitmap (covering 256 possible access levels), this is a handful of CPU instructions. Decision time dropped from ~3 microseconds in the worst legacy case to under 200 nanoseconds. Not visible to end users. Very visible in CPU costs at 50,000 requests per second.

Token size dropped too — from ~3 KB for a heavily-permissioned admin to under 1 KB. The access levels that used to be a long string array became the policy_bitmap field: a base64url-encoded byte slice.

The two paths side by side:

The version problem — and why we kept the old path

Here's the catch: bit indexes have to be stable. If access level user:admin is bit 0 today, it must still be bit 0 when old tokens are being validated. This is managed through a versioned registry — each snapshot of the bit assignments carries a version number, and the JWT records which version was used when it was issued via policy_bitmap_version.

{
  "policy_bitmap": "__________________8H",
  "policy_bitmap_version": 114
}

When the gateway boots, it loads the current registry — say, version 114 — and builds an in-memory lookup from version number to bit-index map. When a token arrives, the gateway reads policy_bitmap_version and checks:

Version matches current registry (114 == 114): decode the bitmap, run intersects(), done.
Version is older (e.g., 112): fall back to string matching against the access level names.
No policy_bitmap_version field: legacy token predating the bitmap feature — fall back to string matching.

The fallback uses the access level names embedded in the token (carried as a separate claim for exactly this purpose) and checks them against the endpoint's required list. Same outcome, no bitmap needed.

This fallback isn't a temporary measure. It's load-bearing. Long-lived service account tokens might be weeks old. We can't deny them just because they predate a registry update. The string-based check is version-agnostic: it doesn't care about bit indexes at all. As long as both sides agree on what the access level strings mean, it works.

New registry versions are created whenever a service publishes new access level definitions through the CI/CD pipeline. The version number increments, new bit positions are assigned to new access levels, and existing assignments are preserved verbatim. Old tokens stay valid — they just take the slightly slower path until they expire naturally.

We track the fallback rate with a metric. When it's near zero, things are healthy. A spike tells us something is wrong — maybe a token issuer is behind on registry versions, maybe a test fixture has stale data, or maybe a new service published access levels without updating the issuer to match.

A few things we deliberately didn't do

Some approaches we considered and rejected:

Caching the authorization decision. "Same token plus same endpoint equals same answer" feels right. It's wrong — role changes, revocation, and tenant changes all invalidate it. We cache the token decode result (the identity and access levels), not the decision.

Per-tenant access level definitions. Letting each tenant define what user:admin means sounds flexible. In practice, it means the registry forks and every cross-tenant reasoning breaks. Access levels are platform-wide; role assignments are per-tenant. That's the line. Individual services define access levels globally — they don't get per-tenant variants.

Hierarchical permissions. "user:admin implies user:read" is elegant on paper. It complicates bitmap encoding and makes rollback harder. We grant both explicitly. A few extra access levels per role is not a real cost.

A central registry team as the bottleneck. Early on, a single team owned all access level definitions. This created a queue — every new feature needed a registry PR to land before it could ship. Moving ownership to service teams via the CI/CD publish flow eliminated that queue entirely. The gateway team reviews the mechanism; service teams review each other's access level semantics in their own PRs.

The pattern underneath the optimization

The bitmap is a performance and density win. But the deeper idea is the same one from the last chapter: make the implicit explicit and keep the decision structure visible.

String matching and bitmap intersection both produce the same outcome — allow or deny. What the bitmap adds isn't correctness, it's compactness: a cheaper wire representation, a faster runtime check, and a version-aware fallback that degrades gracefully instead of breaking.

The CI/CD publish flow adds a different kind of compactness: it removes the coordination overhead of centralized registry management. Services declare what they need. The pipeline handles the distribution. The gateway consumes whatever's in the registry. No tickets, no handoffs.

The fallback is worth lingering on. Most optimizations in auth systems are irreversible — once you commit to a new token format, old tokens become a problem. Keeping the legacy path as a first-class citizen, with its own metrics and log fields, meant we could ship the optimization without a flag day. Old tokens kept working. New tokens got faster. The two paths converged over time on their own.

What the deployment actually looked like

Theory is one thing. Here's the Datadog dashboard from the bitmap deployment on Apr 27 at 17:30.

The real win shows up in p99 latency: it drops from a spiky 5–14 ms pattern to a stable ~4–5 ms, eliminating GC-induced variance from string allocations.

Execution time stayed in the 100–400 µs range, with a one-time spike during in-memory bitmap rebuild. Fallback usage naturally decayed as tokens rotated, validating seamless migration.

Next up: Chapter 7 — token revocation. JWTs are stateless by design, which makes "log this user out right now" genuinely hard. We solved it with a Redis-backed revocation list, an in-process cache, and two startup races we had to fix the painful way.

Part 5 — Multi-tenant auth and routing in Kubernetes

Akarshan Gandotra — Mon, 04 May 2026 18:41:26 +0000

In the first four chapters of this series I've talked about what the Auth Gateway decides. This chapter is about who it decides for.

We run a multi-tenant platform. Every request, on every endpoint, belongs to one tenant. Get tenant resolution wrong and you don't have a security incident — you have a cross-tenant data leak incident, which is a category of bad you don't recover from.

This chapter is the boring, careful, paranoid story of how NGINX and the Auth Service cooperate to never let a request through without a clear tenant identity.

The two questions

Every multi-tenant request raises two questions:

Which tenant is this for? (resolution)
Where does the request go for that tenant? (routing)

We answer #1 at the NGINX layer, before auth. We answer #2 partly at NGINX (path-based routing) and partly inside the upstream service (tenant-scoped queries). The Auth Service sits between them: it makes sure the token's tenant matches the request's tenant before either service sees the request.

Resolution: two valid inputs, one explicit failure mode

We accept two ways to identify a tenant:

X-Tenant-ID header. Explicit. Used by service-to-service calls and SDKs that know who they're for.
Host header (mapped via X-Tenant-Host). Implicit. Used by per-tenant DNS like tenant1.example.com.

We do not accept a third way: a default tenant. There is no fallback. If both inputs are missing or unknown, NGINX returns 400 before the Auth Service is even called.

Why so strict? Because a default tenant is the most expensive bug you can ship. Every "wait why is data showing up in tenant X?" post-mortem starts the same way: somebody added a fallback "for convenience" and somebody else's request hit it without a tenant header.

We removed our default tenant on day 90 and have never looked back.

Per-tenant SNI server blocks

For tenants with their own DNS, NGINX uses server blocks to short-circuit resolution. The Helm chart templates one server per tenant from global.tenants:

{{- range .Values.global.tenants }}
server {
  listen {{ $.Values.containers.containerPort }};
  server_name {{ .tenant_dns }};

  set $tenant_id        {{ .tenant_id }};
  set $tenant_namespace {{ .tenant_namespace }};

  include /etc/nginx/auth.conf;
  include /etc/nginx/locations.conf;
  include /etc/nginx/custom_error_locations.conf;
}
{{- end }}

A request to tenant1.example.com matches server_name tenant1.example.com, lands in this block, and $tenant_id is already set before any other directive runs. There is no header parsing, no map lookup, no opportunity for ambiguity. Tenant identity is pinned at SNI time.

This is also nice for TLS: the per-tenant ingress can attach per-tenant certificates if you want them, and the SNI selection happens before any HTTP processing.

The default server block: header-based fallback

Not every tenant has its own DNS. Many service-to-service calls hit a shared in-cluster ingress with X-Tenant-ID set explicitly. For those, the default server block handles resolution:

server {
  listen {{ .Values.containers.containerPort }};
  server_name _;                            # match anything not matched above

  set $tenant_id "";

  if ($http_x_tenant_id) {
    set $tenant_id $http_x_tenant_id;       # priority 1
  }

  if ($tenant_id = "") {
    set $tenant_id $tenant_id_from_host;    # priority 2: map of X-Tenant-Host
  }

  if ($tenant_id = "") {
    return 400 "Tenant not specified";      # priority 3: hard fail
  }

  include /etc/nginx/auth.conf;
  include /etc/nginx/locations.conf;
  include /etc/nginx/custom_error_locations.conf;
}

$tenant_id_from_host is a map populated from the same global.tenants list:

map $http_x_tenant_host $tenant_id_from_host {
  {{- range .Values.global.tenants }}
  "{{ .tenant_dns }}" "{{ .tenant_id }}";
  {{- end }}
}

A few subtleties worth highlighting:

The order matters. Header beats host. We picked header priority because programmatic clients should be explicit.
$tenant_id_from_host defaults to empty string if the host isn't in the map. We then 400 — same as if the header was missing entirely.
if directives in NGINX are deeply weird. We confined them to this block and resisted the temptation to put ifs anywhere else in the config.

Tenant binding inside the JWT

Once NGINX has set $tenant_id, it forwards it as X-Tenant-ID to the Auth Service. But the token also carries a tenant claim. The Auth Service must check they match:

if userToken.TenantID != log.TenantID {
    c.fail(ctx, log, 401, ReasonTenantMismatch, "tenant mismatch")
    return
}

This is the line that saves you when a malicious actor copies a token from tenant A and replays it against tenant B's hostname. The token signature is valid. The token isn't expired. The token isn't revoked. But the tenant in the token is tenantA and the request is for tenantB. We 401.

Three things make this work:

The token is bound to a tenant at issuance. Our token issuer puts tid: "tenantA" in the JWT claims when it mints the token. We sign with the per-tenant RSA key (Chapter 3), so a token from tenant A can't be re-signed for tenant B without the private key.
The gateway picks the verification key by X-Tenant-ID. If the request says it's for tenant B, we verify the token's signature with tenant B's public key. A tenant A token signed with tenant A's key fails signature validation, not tenant binding — but either way it's denied.
The tenant claim is also checked. Even if the keys were the same, the explicit userToken.TenantID != log.TenantID check would catch reuse.

Belt and suspenders. We've never regretted having both.

Sequence: tenant flow end-to-end

Routing: MT vs ST upstreams

Once we know the tenant, we have to route the request. We have two upstream models:

MT (multi-tenant) services. One deployment, serves all tenants. Tenant comes in as X-Tenant-ID, the service queries data with WHERE tenant_id = ?.
ST (single-tenant) services. One deployment per tenant, in the tenant's own Kubernetes namespace. The service doesn't even need to know about other tenants — it can't see them.

This is purely a per-service architectural choice. Some products are happy with MT; some have stricter isolation requirements (or run heavy per-tenant data) and want ST.

The location loop in locations.conf handles both with one branch:

proxy_pass http://{{ if eq $type "ST" }}{{ $serviceDict.SERVICE_HOST }}.$tenant_namespace.svc.cluster.local{{ else }}{{ $serviceDict.SERVICE_HOST }}{{ end }};

Unrolled:

MT: proxy_pass http://user-service — the bare service name. CoreDNS resolves it to the service's ClusterIP in whatever namespace the gateway lives in.
ST: proxy_pass http://api-service.tenant1-ns.svc.cluster.local — the FQDN includes the tenant namespace. Each tenant has its own copy of the service in their own namespace.

A few subtleties:

For ST, the tenant namespace is part of the DNS name. NGINX's resolver kicks in at request time, not at config-load time. Adding a new tenant means deploying its services in tenantN-ns, then adding it to global.tenants. NGINX picks it up on the next config reload.
For MT, all tenants hit the same upstream IP. The upstream service is responsible for tenant scoping. We trust it because we forward X-Tenant-ID and the upstream service's auth library re-checks the header against the token's tenant. (Yes, double-checking. After the first cross-tenant near-miss, we added it.)
ST is more expensive operationally — N deployments of every service — but radically simpler to reason about. Two services, two answers; pick what your compliance team can defend.

Headers we propagate to upstream

After auth passes, NGINX sends a defined set of headers to the upstream. From locations.conf:

proxy_set_header X-Identity-ID       $identity_id;
proxy_set_header X-Identity-Type     $identity_type;
proxy_set_header X-Identity-Name     $identity_name;
proxy_set_header X-Session-ID        $session_id;
proxy_set_header X-Tenant-ID         $tenant_id;
proxy_set_header X-Tenant-Namespace  $tenant_namespace;
proxy_set_header X-Request-ID        $request_id;

The upstream contract:

X-Identity-ID is the principal. Treat it as the user's primary key.
X-Identity-Type is USER or SERVICE_ACCOUNT. Some endpoints reject service accounts, some require them.
X-Tenant-ID is the tenant. Always scope queries by it.
X-Tenant-Namespace is the Kubernetes namespace, useful for diagnostics and per-tenant Kafka topic naming.
X-Session-ID is an opaque session correlation ID. Useful for logging, never for auth.
X-Request-ID is the trace correlation ID. Forward it to your downstream calls so the whole graph stitches together.

We do not forward Authorization. The upstream service has no business looking at the JWT. If it needs to know who's calling, it uses X-Identity-ID. If it needs to make a downstream call, it gets a fresh service-account token — it does not replay the user's token.

Stripping Authorization was one of those changes that everyone agrees is a great idea in principle and fights tooth-and-nail when their service breaks during the rollout. Worth the fight.

Skipping tenant for a few endpoints

A handful of endpoints genuinely don't have a tenant: NGINX /healthz, public OAuth callbacks, JWKS endpoints, version probes. For these, tenant resolution must not run.

In NGINX:

location /healthz {
  access_log off;
  return 204;
}

This location is matched before the tenant-resolving if block in the default server, because NGINX processes more-specific locations first. /healthz returns 204 without ever evaluating $tenant_id.

Inside the Auth Service, the equivalent pattern shows up: the trie has rows with endpoint_type=OPEN and no tenant requirement. Even if NGINX did pass through, the Auth Service would allow without checking the tenant. Belt and suspenders again.

The Ingress regex: another layer of opt-in

Our cluster's Ingress controller routes some paths through the Auth Gateway and some paths around it. The chart's multi-tenant Ingress uses a negative-lookahead regex to express "send everything to NGINX except these specific exempt paths":

- path: "/(?!{{ $exemptedPattern }}).*"
  pathType: ImplementationSpecific
  backend:
    service:
      name: {{ $shortName }}
      port:
        number: 80

$exemptedPattern is a long alternation built from values.yaml:

ingress:
  exemptedPaths:
    - api/
    - login/
  exemptedPrefixes:
    - ui
  exemptedExtensions:
    - js
    - css
    - ico
    # ... static assets

Anything matching the regex bypasses the gateway and goes to a legacy ingress. This is how we rolled the gateway out one service at a time — and how we keep it manageable today as services migrate at different speeds.

What we'd warn future-us about

A few real lessons:

Default tenants are forever. If you ship one, every subsequent design decision will assume it exists, and removing it later is a multi-quarter project.
Tenant-aware logging is non-negotiable. Every log line must carry tenant_id. We don't grep by user — we grep by tenant first, then narrow. Chapter 9 has the log format.
Keep the tenant model boring. "What is a tenant?" should have a 1-sentence answer. The moment "tenant" starts meaning different things in different services, your isolation guarantees evaporate.
MT and ST are different operating models, not different security models. The same auth contract should hold. If your ST services can be looser because "they're isolated anyway," you have a problem.
Never derive tenant from the user. "User belongs to tenant X, so I'll use tenant X" sounds reasonable until you have users in multiple tenants. The tenant comes from the request, not from the user.

What's next

Chapter 6 stays on tenant boundaries but zooms in on the authorization side: roles, access levels, the role → access-level → endpoint mapping, and the bitmap fast path that replaced our original string-set matching. We'll see why a JWT should not be the source of truth for a user's full permission set, and how to encode permissions densely enough that the gateway can decide in O(1) bitwise.

Part 4 — Endpoint classification: OPEN, AUTHENTICATED, ACCESS_CONTROLLED

Akarshan Gandotra — Mon, 04 May 2026 18:41:07 +0000

In Chapter 3 the controller branched on something called the "endpoint type":

switch endpointType(perms) {
case "OPEN":            ...
case "AUTHENTICATED":   ...
case "ACCESS_CONTROLLED": ...
}

That branch is the most important conditional in the entire gateway. It decides whether a request even gets a token check, and whether to run authorization. This chapter is about how that decision is data, not code, and the trie that powers it.

Three kinds of endpoint

Every endpoint in our platform falls into one of three buckets:

OPEN — no auth required at all. Health checks, public OAuth callbacks, JWKS, version, docs. The request is allowed without a token.
AUTHENTICATED — token required, no specific permission. "Get my own profile," logout, list-my-stuff endpoints. Anyone with a valid token can call it.
ACCESS_CONTROLLED — token required and a specific permission. Admin operations, deletes, anything that crosses a user boundary.

The Auth Service runs different pipelines for each:

The brilliance — and we say this honestly, because we did not design it this way the first time — is that an endpoint's classification is a column in a database row, not a hardcoded route. Adding a new admin route means inserting a row, not deploying the gateway. We rebuild the in-memory data structure on demand.

The data structure: a trie

When NGINX hits /auth, it forwards X-Original-URI: /user-management/v1/users/abc123 and X-Original-Method: GET. We need to turn that into an endpoint metadata record.

The naïve approach is a big map with (method, full_path) keys. That works until you have wildcards: /users/{id} should match both /users/abc and /users/xyz. Once you have wildcards you want a trie.

Our trie node looks like this:

type TrieNode struct {
    children   map[string]*TrieNode  // exact-match segments
    wildcard   *TrieNode             // catch-all child for {id}-style segments
    Permissions map[string][]string  // method -> required permissions
    EndpointType string               // OPEN | AUTHENTICATED | ACCESS_CONTROLLED
    BitmapMask   uint64               // pre-computed for bitmap fast-path (Chapter 6)
}

A trie key is a slash-segmented path. /users/{id}/roles becomes the path ["users", "{id}", "roles"]. Walking the trie is one segment at a time:

func (t *Trie) Lookup(method string, path []string) (*TrieNode, bool) {
    node := t.root
    for _, seg := range path {
        if next, ok := node.children[seg]; ok {
            node = next
            continue
        }
        if node.wildcard != nil {
            node = node.wildcard
            continue
        }
        return nil, false
    }
    if _, ok := node.Permissions[method]; ok {
        return node, true
    }
    return nil, false
}

O(depth) worst case, where depth is the number of segments. In practice a typical endpoint is 3-5 segments. We're talking nanoseconds.

A toy view of what one slug's trie looks like:

Notice me and {id} are both children of users. users/me resolves first (exact match) and gets AUTHENTICATED. users/abc123 falls through to the {id} branch and gets ACCESS_CONTROLLED + user:read. Order matters: exact wins over wildcard.

Service slug: scoping the trie

We don't have one giant trie. We have one trie per service slug. Why?

Different services own different paths.
A single global trie collides on ambiguous paths (multiple services exposing /v1/users for different reasons).
Cache locality and refresh granularity are better when each service's routes are isolated.

For a long time the slug came from "the first segment of the URI." That was fine until services nested each other (/api/v2/user-management/users). So we added two optional headers:

X-Service-Slug — explicit slug.
X-Request-Path — the path inside that slug, with the prefix already stripped.

The route resolver uses them when present:

func (r *RouteResolver) ResolveEndpointWithMetrics(ctx *gin.Context, log *AuthDecisionLog) (
    slug, trieKey string, perms []string, trieExists, found bool,
) {
    slug = ctx.GetHeader("X-Service-Slug")
    path := ctx.GetHeader("X-Request-Path")

    if slug == "" || path == "" {
        // Legacy: split URI on first segment
        slug, path = splitFirstSegment(log.URI)
    }

    trie := globalTrieRegistry.Get(slug)
    if trie == nil {
        return slug, path, nil, false, false
    }

    node, ok := trie.Lookup(log.Method, segments(path))
    if !ok {
        return slug, path, nil, true, false
    }
    return slug, path, node.Permissions[log.Method], true, true
}

Two return flags, not one: trieExists distinguishes "the service slug isn't registered" from "the service slug is registered but this path doesn't match." The first is a server problem (deploy mismatch). The second is a client problem (404). Different decision reasons, different alerts.

Loading the trie from Postgres

The source of truth is two tables:

-- Each row is one (service, method, pattern) combination
CREATE TABLE endpoint (
    id            uuid PRIMARY KEY,
    service_slug  text NOT NULL,
    method        text NOT NULL,
    pattern       text NOT NULL,
    endpoint_type text NOT NULL  -- OPEN | AUTHENTICATED | ACCESS_CONTROLLED
);

-- Permissions an endpoint requires (only for ACCESS_CONTROLLED)
CREATE TABLE endpoint_policy (
    endpoint_id      uuid REFERENCES endpoint(id),
    general_policy_id uuid REFERENCES general_policy(id)
);

-- The actual policy table — name + bit_index for the bitmap path (Chapter 6)
CREATE TABLE general_policy (
    id        uuid PRIMARY KEY,
    name      text UNIQUE,
    bit_index int  -- nullable; assigned for bitmap-eligible permissions
);

At startup, LoadTrieAndRegistry runs one query per active service slug, joins the three tables, and builds a Trie per slug:

func LoadTrieAndRegistry(ctx context.Context, db *sql.DB) (
    map[string]*helpers.Trie, *policybitmap.Snapshot, error,
) {
    rows, _ := db.QueryContext(ctx, `
        SELECT e.service_slug, e.method, e.pattern, e.endpoint_type,
               coalesce(array_agg(gp.name) FILTER (WHERE gp.id IS NOT NULL), '{}'),
               coalesce(array_agg(gp.bit_index) FILTER (WHERE gp.bit_index IS NOT NULL), '{}')
          FROM endpoint e
          LEFT JOIN endpoint_policy ep ON ep.endpoint_id = e.id
          LEFT JOIN general_policy  gp ON gp.id = ep.general_policy_id
         GROUP BY e.id
    `)

    tries := map[string]*helpers.Trie{}
    snap  := policybitmap.NewSnapshot()

    for rows.Next() {
        var (
            slug, method, pattern, etype string
            permNames []string
            bitIdxs   []int
        )
        rows.Scan(&slug, &method, &pattern, &etype, pq.Array(&permNames), pq.Array(&bitIdxs))

        if _, ok := tries[slug]; !ok {
            tries[slug] = helpers.NewTrie()
        }
        node := tries[slug].Insert(segments(pattern))
        node.EndpointType = etype
        node.Permissions = map[string][]string{method: permNames}
        node.BitmapMask  = computeMask(bitIdxs)  // Chapter 6
    }
    return tries, snap, nil
}

A single SQL query, one pass to build N tries. We measure load time and emit it as a startup log: trie_load_duration_ms. On a healthy database, hundreds of tries with thousands of routes load in well under a second.

Refreshing without a restart

Endpoint metadata changes — we onboard a new service, add a new permission, deprecate a route. We don't want to roll the gateway every time.

There are two refresh mechanisms:

1. Periodic. Every TRIE_REFRESH_INTERVAL_SECS (default: 1 hour) we re-run the loader. This is a safety net. If the live channel ever misses an event, periodic catches it within an hour.

2. Live. A Redis Pub/Sub channel called auth:trie:refresh. When admin tooling changes endpoint metadata, it PUBLISHes to that channel. Every Auth Service pod is subscribed, and refreshes within milliseconds.

The Pub/Sub message itself carries no payload. It's purely a kick. Each pod queries the database itself. Two reasons:

We don't have to keep the message in sync with the schema. New columns, no message change.
A pod that just booted does the same load as a pod that received a refresh kick. One code path.

The downside: every refresh kick = one query per pod. If you have 100 pods and someone bulk-edits 1000 endpoints with 1000 publishes, that's 100,000 queries. We added a debounce (coalesce events within a 200 ms window). Onboarding a service still hammers the DB once, briefly, but it doesn't spiral.

Caching the lookup

Even with a fast trie, re-walking the same path on every request is wasteful. We layer a TinyLFU cache (W-TinyLFU, via a Go port) on top of the resolver:

type RouteCache struct {
    inner *tinylfu.Cache[string, RouteResult]
}

func (c *RouteCache) Get(slug, method, path string) (RouteResult, bool) {
    return c.inner.Get(slug + "\x00" + method + "\x00" + path)
}

The key is service_slug + NUL + method + NUL + path. The NUL byte is a delimiter — paths can't contain it, so we can't accidentally collide a slug with the start of another slug.

The cache is bounded by entry count, not by RAM. Default: 10,000 routes with 100,000 frequency counters. TinyLFU keeps the frequently used routes hot and evicts cold ones — better than LRU when traffic has a long tail of rarely-hit paths.

Cache invalidation: on any trie reload (periodic or kick), the cache is fully cleared. We considered partial invalidation. We chose not to — the cache fills back up in milliseconds because the same handful of paths drives 99% of the traffic.

Why this beats per-route decorators

In a Python or Node service you might write:

@app.route("/users/<id>", methods=["GET"])
@requires_permission("user:read")
def get_user(id): ...

Three problems with that:

Service-by-service drift. Each service maintains its own decorators. Different services use different permission strings, different exception types, different log shapes. The contract is informal.
Fragile to refactor. Move the route, lose the decorator. Now anyone can call it. We've seen this happen at zero, two, and twelve months in.
Audit is impossible at the service level. "What permission protects this URL?" requires reading every service's source.

By making endpoint metadata a database row, we get:

One contract, in one schema.
Decoupled from code — no rebuild to add a route.
Auditable: a single SELECT answers "what does this URL require?"
Reusable: the same metadata drives the gateway and the admin UI that lets product managers tweak it.

The cost is that you can't read the auth requirements next to the handler in the upstream service's code. We mitigate that with code generation: a CI job dumps endpoint rows for each service into a routes.yaml checked into the service's repo for reference. The DB stays the source of truth.

Readiness depends on the trie

There's a subtle interaction with Kubernetes probes worth calling out. Our /readyz returns 503 if the trie isn't loaded:

func CheckTrieReadiness(tries map[string]*helpers.Trie) (int, gin.H) {
    if len(tries) == 0 {
        return http.StatusServiceUnavailable, gin.H{"status": "trie not loaded"}
    }
    return http.StatusOK, gin.H{"status": "ok"}
}

A pod that boots before Postgres is reachable will fail readiness, which keeps it out of the Service load balancer until the trie is loaded. A pod that was serving traffic and then loses access to Postgres keeps its existing trie in memory and stays ready — refreshes fail loudly via Slack, but live traffic isn't disrupted.

This split — "load fails block readiness, refresh failures don't" — is on purpose. A booting pod with no trie can't make decisions; pull it out. A serving pod with a stale trie can still make decisions correctly for endpoints that haven't changed; keep it in service while we fix Postgres.

Anti-patterns we avoided

Worth listing what we considered and rejected, because they're tempting:

Storing the endpoint type inside the JWT. Tempting because then you don't need a trie. Wrong because a token outlives configuration changes — we'd cache stale auth requirements.
A single hard-coded list of public paths. A previous iteration had a YAML file shipped with the gateway. Updating it required a deploy. The trie + DB replaced it.
Per-tenant route metadata. We talked about it. We rejected it: the same service exposes the same routes for every tenant. Tenant-specific differences belong in the access-level model (Chapter 6), not the route model.
Letting the upstream service register its own routes via API at startup. Looks elegant. Falls apart in chaos: a buggy service can register away its own auth.

What's next

Chapter 5 takes the same metadata-as-data philosophy and applies it to multi-tenancy. The trie tells us what an endpoint requires; tenant resolution tells us who it's for. We'll see how NGINX server blocks, headers, and tenant maps cooperate to never let a request through without a clear tenant identity.

Part 3 — Inside the Auth Service: From Token Validator to Policy Decision Point

Akarshan Gandotra — Mon, 04 May 2026 18:40:48 +0000

Most auth services start simple — verify the token, return 200 or 401. Then requirements accumulate. Tenant isolation. Service accounts. Token revocation. Access levels per endpoint. And suddenly what was a lightweight validator is carrying a lot of weight, without a clear structure to hold it.

This post is about how we structured ours — the ideas that shaped it, and the ones we got wrong before landing here.

One job, lots of supporting infrastructure

The Auth Service does exactly one thing from the outside: receive a subrequest from NGINX, inspect the headers, and return a decision. Under a millisecond, every time.

But a single HTTP handler that does that reliably at scale has a lot underneath it — caching, revocation checks, routing logic, identity propagation. The structural challenge is keeping the handler small while the infrastructure grows. We landed on a controller that reads like a flowchart:

Extract the request metadata (URI, method, tenant).
Resolve the endpoint to find out what kind of auth it needs.
Based on that: allow it openly, run authentication only, or run authentication and authorization.

That's the whole thing. Everything else is a service the controller delegates to.

The insight that changed how we think about routing: endpoint classification is data, not code

Early on, we made auth decisions in code. A route was open because someone wrote if path == "/health" { return 200 }. Access control lived in conditionals scattered across handlers.

This breaks the moment your product team adds a new endpoint, or you need to temporarily open a route for a partner integration, or you realize a route that was open should have been authenticated all along.

We flipped it: every endpoint in the system has a classification stored in the database — OPEN, AUTHENTICATED, or ACCESS_CONTROLLED — along with a permission list if it's access-controlled. The auth service resolves the incoming request to an endpoint record and reads that classification. The decision logic then becomes a simple switch:

OPEN: allow, log it, done.
AUTHENTICATED: run token validation.
ACCESS_CONTROLLED: run token validation, then check permissions.

The consequence is that we never recompile or redeploy the Auth Service to change how a route is protected. That's a database update. It also means non-engineers can reason about the access model without reading code.

Naming every failure: the decision-reason contract

The second structural idea that shaped everything else: every outcome has an explicit name.

We maintain an enumerated list of decision reasons — constants like MISSING_TOKEN, TENANT_MISMATCH, TOKEN_REVOKED, SA_VERSION_MISMATCH, OPEN_ENDPOINT, ACCESS_LEVEL_MATCH. Every code path in the service must set one before returning. There's no exit that doesn't produce a named reason.

const (
    ReasonOpenEndpoint       = "OPEN_ENDPOINT"
    ReasonMissingToken       = "MISSING_TOKEN"
    ReasonTokenRevoked       = "TOKEN_REVOKED"
    ReasonTenantMismatch     = "TENANT_MISMATCH"
    ReasonSAVersionMismatch  = "SA_VERSION_MISMATCH"
    ReasonTokenTypeMismatch  = "TOKEN_TYPE_MISMATCH"
    // ... and so on
)

This sounds like a minor logging detail. It isn't.

When a token fails, why it fails tells a completely different story depending on the reason. TOKEN_REVOKED means the user logged out or was disabled. SA_VERSION_MISMATCH means a service account was rotated and the calling service hasn't caught up. TOKEN_TYPE_MISMATCH means something is trying to authenticate with a refresh token where it should use an access token — usually a buggy SDK, occasionally something worth investigating.

If all of these collapsed into a generic 401 Unauthorized, you'd lose all of that signal. Dashboards would be useless. On-call would be guessing.

The list itself is a contract with the log pipeline. New reasons go through code review. Old reasons can't be deleted without checking dashboards and alerts first. It's one of the few places in the codebase where "this is more rigid than it needs to be" is actually correct.

One log line per request — and why that matters more than it sounds

Our first approach was to emit log lines at each stage of the pipeline — one when we resolved the route, one when we validated the token, one when we made the authorization decision. We could stitch them together by request ID.

We abandoned this. The stitching was always slightly wrong. Correlation IDs got dropped. Fields you needed were in a different log line than the one you found first. Debugging a production incident meant reconstructing a timeline from fragments.

Now there's one structured log record per request. It's built up incrementally — every handler in the pipeline writes into the same struct. By the time the response goes out, the record has every field: URI, method, tenant, identity, cache hit status, decision reason, outcome. It emits once, at the end.

The operational improvement was immediate. Grepping for a user's identity ID gives you a complete picture of every request they made — what was allowed, what was denied, and exactly why. No joining, no reconstruction.

If you're designing an auth service, this is the first structural decision we'd recommend getting right. Everything else can be refactored. The logging model tends to calcify early.

How we handle JWT verification at scale

Validating a JWT sounds cheap. For HS256 with a shared secret, it mostly is. For RS256 with asymmetric keys — which is what we use for user-facing tokens — the RSA verification step sits in the hundreds of microseconds. At meaningful request volume, that becomes a real CPU cost.

Our solution is a cache in front of the decode step. The cache key is a hash of the raw token string (not the string itself — the hash is 8 bytes versus potentially hundreds, which adds up at scale). The TTL matches the token's expiry. When a token comes in that we've already verified recently, we skip the RSA verification entirely.

A few things we were careful not to cache:

Revocation state. Whether a token has been revoked can change at any moment, independent of the token's validity. We cache the decode result — the claims, the identity — but we always check revocation live. These are different questions.

The auth decision itself. The decision depends on the endpoint, the tenant, and the required access level, none of which the token cache knows about. Caching decisions would mean a user who got their access level changed mid-session would still see stale decisions until cache expiry. Unacceptable.

The principle here generalizes: cache the facts (what the token says), not the decisions (what we're going to do about it).

The boundary the Auth Service deliberately doesn't cross

The clearest sign a service is well-designed is what it refuses to do.

Our Auth Service handles coarse-grained access: does this identity have the level of access required to reach this endpoint category? That's it. It does not answer questions like "can this user delete this specific record?" or "does this account have permission to access this tenant's billing history?"

Those are business policy questions. They belong in the services that own that data, where the full context exists.

Every time we've been tempted to push business logic into the Auth Service — usually because it would be convenient, or because a product requirement seemed auth-adjacent — we've regretted it. Business policy changes frequently. Auth infrastructure should be boring and stable. Keeping them separate means changes to one don't put the other at risk.

The Auth Service also doesn't store sessions, doesn't issue tokens, and doesn't look up users. Tokens carry enough identity for upstream services to do that themselves. The Auth Service only validates.

The pattern underneath all of this

Looking back, the decisions that held up over time share a common shape: make the implicit explicit.

Endpoint classification pulled auth rules out of code and into data. Decision reasons named every outcome instead of letting them collapse into status codes. The single log line made the request lifecycle visible as a single artifact instead of scattered fragments. The cache/decision boundary separated "what the token says" from "what we're going to do about it."

None of these are particularly novel ideas. But they compound. A service where every decision is named, every outcome is logged atomically, and every boundary is deliberate is a service you can actually operate.

That's the goal.

Next up: Chapter 4 — the path trie that resolves incoming URIs to endpoint records in O(path length), without a database call on the hot path.

Part 1 — Why we built an Auth Gateway instead of putting auth in every service

Akarshan Gandotra — Mon, 04 May 2026 18:39:30 +0000

If you've been on a platform team long enough, you've probably watched this slow-motion failure:

You ship an auth library. Three services adopt it. Six months later, two of them are still on v1.0, one forked it to add a custom claim, and a fourth service rolled its own because the library "didn't fit their use case." A CVE drops. Now you're hunting through repos to find every place that decodes a JWT.

We've been running a multi-tenant platform on Kubernetes for a while, and we kept ending up there. So a couple of years ago we made a call: stop trying to protect every service and start making the decision once — at the edge.

This is the first post in a 10-part series about that gateway. The actual gateway is two pieces:

NGINX, packaged as a Helm chart, that fronts every authenticated route.
Auth Service, a small Go service that exposes a single POST /auth endpoint. NGINX hits it as a subrequest on every protected request.

I'll skip the marketing in this series. I'll show real code, real config, and the parts that hurt.

The decision: three things, three different homes

The mistake we kept making was treating "auth" as one thing. It's three:

Authentication — who is this caller? (Authorization: Bearer ..., signature, expiry, revocation)
Authorization — are they allowed to call this endpoint? (role, access level, tenant)
Routing — where does this request go? (multi-tenant DNS, single-tenant vs multi-tenant upstream, header propagation)

If you put all three inside every service, every service ends up with its own opinion. So we split:

NGINX owns routing and fail-closed posture. It already sits in the request path. It's the cheapest place on earth to say "no."
Auth Service owns the decision — token validity, endpoint classification, access level check.
Upstream services own the business logic and trust the identity headers NGINX injects.

What this looks like on a single request

Here's the happy path, as it actually runs in production. A user calls GET /user-management/users/me with a bearer token and a tenant header.

A few things worth noticing:

The subrequest is a POST, not a GET. NGINX's auth_request always uses proxy_method POST in our chart. The Auth Service doesn't need a body — it decides from X-Original-URI, X-Original-Method, X-Tenant-ID, and the bearer token.
The Auth Service responds with identity headers. NGINX pulls them out with auth_request_set and re-injects them into the upstream proxy call. Upstream services never look at the JWT — they trust X-Identity-ID because they trust NGINX.
X-Request-ID is propagated end-to-end. Every log line on every hop carries the same id. (More on that in Chapter 9.)

The deny path is where centralization actually pays off

Two things that we got for free the moment we centralized:

Identical error envelopes. Whether the failure is "no tenant header," "expired token," "wrong access level," or "the Auth Service itself is down," the client sees the same shape: {"source":"auth","message":"...","code":"...","error":"..."}. We didn't have to coordinate this across 30 services.
Upstream services never run on a bad token. They aren't even invoked. That alone fixed a long tail of "service X returned 200 with weird data because the token didn't validate but the framework didn't care."

The corresponding NGINX config is small and worth showing in full. Trimmed to the parts that matter:

# auth.conf — the subrequest endpoint
location = /auth {
  internal;                                # only callable from auth_request

  proxy_pass_request_body off;             # auth doesn't need the body
  proxy_method POST;
  proxy_set_header Content-Length "";

  proxy_set_header X-Original-URI    $request_uri;
  proxy_set_header X-Original-Method $request_method;
  proxy_set_header X-Original-Host   $host;
  proxy_set_header X-Request-ID      $request_id;
  proxy_set_header X-Tenant-ID       $tenant_id;
  proxy_pass_request_headers on;            # forward Authorization etc.

  proxy_connect_timeout 5s;
  proxy_read_timeout    10s;
  proxy_next_upstream   error timeout http_500 http_502 http_503 http_504;
  proxy_next_upstream_tries 2;
  proxy_next_upstream_timeout 15s;

  error_page 502 503 504 = @auth_unavailable;   # fail-closed: 503 to the client

  proxy_pass http://auth_service/auth;
}

And here's how a regular service location uses it:

location /user-management/ {
  set $product "incore";
  set $microservice "user-service";

  auth_request     /auth;
  auth_request_set $identity_id        $upstream_http_x_identity_id;
  auth_request_set $identity_type      $upstream_http_x_identity_type;
  auth_request_set $session_id         $upstream_http_x_session_id;
  auth_request_set $auth_error_message $upstream_http_x_auth_error_message;
  auth_request_set $auth_error_code    $upstream_http_x_auth_error_code;

  error_page 401 = @unauthorized;
  error_page 403 = @forbidden;
  error_page 502 503 504 = @upstream_unavailable;

  proxy_set_header X-Identity-ID  $identity_id;
  proxy_set_header X-Tenant-ID    $tenant_id;
  proxy_set_header X-Request-ID   $request_id;

  rewrite ^/user-management/(.*)$ /$1 break;
  proxy_pass http://user-service;
}

That's roughly 25 lines per route, generated by a Helm range over the services dictionary. Adding a new service is a values change — no NGINX expert required.

Why NGINX, specifically

We didn't pick NGINX because of opinions. We picked it because of one directive: auth_request.

auth_request lets you tell NGINX: before you proxy the main request, fire a subrequest to this internal location. If the subrequest returns 200, continue. If it returns 401 or 403, stop and run my error handler. If it returns 5xx, run my "auth is down" error handler.

That sounds boring. It's not. It means:

Your upstream services don't see unauthenticated traffic at all.
You can change auth logic by deploying one service. No client SDK update, no library bump in 30 repos.
You get a single observable choke-point. auth_request_time_ms is one log field. We graph it. We page on it.
You can implement fail-closed by default with one line: error_page 502 503 504 = @auth_unavailable;. If the Auth Service is unhealthy, NGINX returns 503 to the client instead of letting the request through. We pay this cost on purpose. Allowing traffic through a broken auth check is how data leaks happen.

We'll dissect auth_request in Chapter 2.

Where the Auth Service fits

The Auth Service is intentionally small. It does five things:

Reads a few request headers.
Resolves the tenant.
Matches the request path to an endpoint metadata record (we call this the trie — Chapter 4).
Classifies the endpoint as OPEN, AUTHENTICATED, or ACCESS_CONTROLLED and runs the right validation pipeline.
Emits exactly one structured AUTH_DECISION log line with the timing, identity, decision reason, and outcome.

It does not store sessions. It does not mint tokens (a separate service does). It does not know about your business logic. It's a policy decision point — the thing whose only job is to answer "yes or no, and why."

Here's the controller, paraphrased:

func (c *AuthController) Auth(ctx *gin.Context) {
    log := helpers.GetAuthDecisionLog(ctx)

    log.URI    = ctx.GetHeader("X-Original-URI")
    log.Method = ctx.GetHeader("X-Original-Method")
    log.TenantID = ctx.GetHeader("X-Tenant-ID")

    slug, trieKey, perms, trieExists, found := c.routeResolver.
        ResolveEndpointWithMetrics(ctx, log.URI, log.Method)

    if !trieExists { /* 503: trie not initialized */ }
    if !found      { /* 404: no such API found  */ }

    switch endpointType(perms) {
    case OPEN:
        log.DecisionReason = ReasonOpenEndpoint
        log.Outcome = "allow"
        ctx.Status(http.StatusOK)
    case AUTHENTICATED:
        c.runAuthN(ctx, log)
    case ACCESS_CONTROLLED:
        c.runAuthN(ctx, log)
        c.runAuthZ(ctx, log, perms)
    }
}

Five branches. That's the entire shape of the gateway. The next nine posts are about what's inside each branch and what we learned operating it at ~50k RPS.

What centralizing actually costs

I'd be lying if I said this was free. Three real costs:

One extra hop on the hot path. Every authenticated request now does an in-cluster RPC to the Auth Service before it goes anywhere. We make this cheap with caching (Chapter 8) and with Ristretto-backed JWT verification, but the hop is still there. Median auth_request_time_ms is in the low single digits in our environment, but it's a budget you have to keep.

The Auth Service has to be HA. When it's down, everything is 503. We chose fail-closed on purpose — a permissive default would mean unauthenticated traffic could hit business services during an outage — but it raises the bar on availability. We run it with an HPA (2–10 replicas, 75% CPU/mem), keep alive 64 connections per worker, and gate readiness on the trie being loaded. Even with that, the Auth Service is the single most carefully operated thing on the platform.

auth_request is not cached. This surprises people. NGINX does not cache auth subrequest responses by default, and the obvious caching workarounds (caching by Authorization header) are dangerous in a multi-tenant world. Every protected request hits the auth pod. So everything inside the auth pod has to be fast. That constraint is what shaped the entire internal design — and is why Chapters 7 and 8 spend a lot of time on caches that live inside the auth process, not in NGINX.

Before vs after, at a glance

flowchart LR
    subgraph Before["Before — auth in every service"]
      C1[Client] --> S1[Service A<br/>auth lib v1.2] & S2[Service B<br/>auth lib v1.0] & S3[Service C<br/>custom auth]
    end
    subgraph After["After — Auth Gateway"]
      C2[Client] --> NX[NGINX<br/>auth_request] --> AU[Auth Service]
      NX --> SA[Service A] & SB[Service B] & SC[Service C]
    end

What's coming

This series moves from primitive to production-grade:

Chapter 2 — auth_request in depth. Subrequest lifecycle, auth_request_set, named error_page locations.
Chapter 3 — inside the Auth Service. JWT validation, the per-tenant RSA key cache, the decision-reason model.
Chapter 4 — endpoint classification (OPEN / AUTHENTICATED / ACCESS_CONTROLLED) and the trie that drives it.
Chapter 5 — multi-tenant routing. SNI server blocks, X-Tenant-ID vs X-Tenant-Host, MT vs ST upstreams, and why we 400 on no tenant.
Chapter 6 — authorization at scale. Role → access level → endpoint, and the bitmap fast path that replaced string-set matching.
Chapter 7 — token revocation without killing performance. Redis ZSET + Stream + local cache, the race-condition fixes, and service-account rotation.
Chapter 8 — every cache in the hot path and how each one is invalidated.
Chapter 9 — operating the gateway. The AUTH_DECISION log, OpenTelemetry, health probes, and degraded-mode alerts.
Chapter 10 — what we'd build differently on day one.

Chapter 2 is up next: auth_request is a 12-character directive that quietly does most of the work in this post. I want to show you exactly why it's the right primitive — and what its sharp edges are.

If you're working on something similar and want to compare notes, drop a comment. We made plenty of mistakes; happy to share which ones bit hardest.

Part 2 — NGINX auth_request: the small primitive that changed everything

Akarshan Gandotra — Mon, 04 May 2026 18:38:31 +0000

In Chapter 1 I claimed our entire Auth Gateway is built on top of one NGINX directive: auth_request. This chapter is a deep dive into how that directive actually works, and the four or five sharp edges that bit us before we got the config right.

If you already know auth_request cold, skim to "Sharp edge 1" near the bottom — that's where the real war stories are.

What `auth_request` actually does

Drop this in a location block:

location /user-management/ {
    auth_request /auth;
    proxy_pass http://user-service;
}

When a request matches /user-management/, NGINX:

Pauses the main request before doing anything to the upstream.
Fires an internal subrequest to /auth.
Looks at the subrequest's HTTP status:
- 2xx → continue with the main request.
- 401 or 403 → abort the main request and return that status to the client.
- Anything else → fall through to your error_page directives, or return 500.

That's the entire surface area. Two things to internalize:

The subrequest never reaches the client. The client only sees the result — usually a 200 from your upstream, or a 401 NGINX synthesized.
The subrequest target is just a normal location block. Any NGINX feature works there: proxy_pass, timeouts, retries, keepalive pools, even another auth_request (don't do this).

The subrequest lifecycle, as a timeline

Notice the order: the subrequest is fully finished before NGINX touches the upstream. There is no streaming, no overlap. That's why latency adds up — your auth time is purely sequential to your upstream time.

What the subrequest actually looks like

Here is the full auth.conf we ship in our Helm chart, trimmed of comments and noise:

location = /auth {
  internal;                          # not callable from outside

  proxy_pass_request_body off;        # Critical: don't ship the body
  proxy_set_header Content-Length "";
  proxy_method POST;

  proxy_set_header X-Original-URI    $request_uri;
  proxy_set_header X-Original-Method $request_method;
  proxy_set_header X-Original-Host   $host;
  proxy_set_header X-Original-URL    $scheme://$http_host$request_uri;
  proxy_set_header X-Request-ID      $request_id;
  proxy_set_header X-Tenant-ID       $tenant_id;
  proxy_pass_request_headers on;      # forward Authorization, cookies, etc.

  proxy_buffering        off;
  proxy_http_version     1.1;
  proxy_set_header       Connection "";

  proxy_connect_timeout 5s;
  proxy_send_timeout    10s;
  proxy_read_timeout    10s;
  proxy_next_upstream   error timeout invalid_header http_500 http_502 http_503 http_504;
  proxy_next_upstream_tries   2;
  proxy_next_upstream_timeout 15s;

  error_page 502 503 504 = @auth_unavailable;

  proxy_pass http://auth_service/auth;
}

Every line in there is the result of an outage post-mortem. Worth walking through the non-obvious ones.

`internal;`

Without this, /auth would be a public endpoint anyone could hit. With it, NGINX only allows the location to be called from a subrequest. Try curl https://your-host/auth and you get 404. This is the same pattern NGINX uses for its own named locations.

`proxy_method POST` + `proxy_pass_request_body off`

The Auth Service doesn't care about the request body. It cares about the URI, the method, the tenant, and the bearer token. So we strip the body and force POST. Two reasons:

Performance. A 50 MB upload would otherwise be buffered to the auth subrequest before it could be streamed to the upstream. That's a non-starter.
Security. The Auth Service shouldn't be a side-channel exfiltration target for upstream payloads.

But we're forcing POST even though we drop the body. Why? Because some load balancers and observability tools treat POST /auth differently from GET /auth, and we wanted the subrequest to be obviously a write of a decision request, not a read.

`proxy_pass_request_headers on`

The Auth Service needs Authorization, Cookie, X-Forwarded-For, etc. We pass them all. The subrequest is in-cluster — there's no trust boundary between NGINX and the Auth Service.

`proxy_set_header X-Original-*`

NGINX rewrites the URI of a subrequest to the subrequest target (/auth). The Auth Service has no idea what URL the client originally hit. So we explicitly forward:

X-Original-URI — the path with query string, used for endpoint matching and audit.
X-Original-Method — the original HTTP verb.
X-Original-Host — the host header, useful for tenant resolution by hostname.
X-Original-URL — full URL, for logs.

These headers are the contract between NGINX and the Auth Service. Change them carelessly and you break every auth decision in the platform.

`proxy_buffering off`

For a tiny 2-line JSON response, buffering hurts more than it helps. We get a few hundred microseconds back per request with this.

`proxy_http_version 1.1` + `Connection ""`

Combined with the upstream's keepalive 64, this enables connection reuse between NGINX and the Auth Service. Without it, every subrequest opens a fresh TCP connection — disastrous at any real RPS.

Timeouts and retries

proxy_connect_timeout 5s;
proxy_send_timeout    10s;
proxy_read_timeout    10s;
proxy_next_upstream_tries   2;
proxy_next_upstream_timeout 15s;

Translation: try to connect in 5s, send in 10s, read in 10s. If we get a connection error or 5xx, retry once. The whole thing is bounded at 15s.

These are huge ceilings — a healthy auth pod responds in single-digit milliseconds. They exist for the worst case: a partition, a failing pod, a slow JWKS fetch on first request. We'd rather wait 15 seconds and serve a clean 503 than time out at 1 second and have flaky behavior under load.

`error_page 502 503 504 = @auth_unavailable`

This is the fail-closed line. If the Auth Service is unreachable after retries, NGINX runs the @auth_unavailable named location instead of just 502'ing the client.

Pulling identity out: `auth_request_set`

A subrequest succeeding (200) tells NGINX to continue, but on its own it doesn't tell the upstream service who is calling. That's where auth_request_set comes in. It pulls headers off the subrequest's response and binds them to NGINX variables, which we then forward.

From locations.conf:

auth_request /auth;
auth_request_set $auth_time     $upstream_response_time;
auth_request_set $auth_status   $upstream_status;
auth_request_set $identity_id   $upstream_http_x_identity_id;
auth_request_set $identity_type $upstream_http_x_identity_type;
auth_request_set $identity_name $upstream_http_x_identity_name;
auth_request_set $session_id    $upstream_http_x_session_id;
auth_request_set $auth_error_message $upstream_http_x_auth_error_message;
auth_request_set $auth_error_code    $upstream_http_x_auth_error_code;

Two patterns at play:

$upstream_response_time and $upstream_status are the auth subrequest's transport metadata. We capture them so they end up in our log line.
$upstream_http_x_identity_id is NGINX's way of saying "the value of the X-Identity-ID response header on the most recent upstream call." We freeze that into $identity_id before we touch the actual upstream service — otherwise the upstream's response headers would clobber it.

Then, in the same location, we pass those variables forward:

proxy_set_header X-Identity-ID   $identity_id;
proxy_set_header X-Identity-Type $identity_type;
proxy_set_header X-Identity-Name $identity_name;
proxy_set_header X-Session-ID    $session_id;
proxy_set_header X-Tenant-ID     $tenant_id;
proxy_set_header X-Request-ID    $request_id;

The upstream service trusts these. It doesn't see the JWT. It doesn't validate the token. It loads the user by X-Identity-ID and gets on with its life.

Named error_page locations: clean envelopes for every failure

auth_request returning 401 doesn't automatically send a clean 401 to the client — it just tells NGINX the request was unauthorized. By default the response body is empty, which makes for sad logs and worse client behavior.

We use named error_page locations to attach JSON envelopes:

location /user-management/ {
  auth_request /auth;
  # ...
  error_page 401 = @unauthorized;
  error_page 403 = @forbidden;
  error_page 500 = @internal_server_error;
  error_page 502 503 504 = @upstream_unavailable;
  proxy_pass http://user-service;
}

And the named locations live in custom_error_locations.conf:

location @unauthorized {
  internal;
  default_type application/json;
  add_header X-Request-ID $request_id always;
  return 401 '{"source":"auth","message":"Unauthorized","code":"$auth_error_code","error":"$auth_error_message"}';
}

location @forbidden {
  internal;
  default_type application/json;
  if ($auth_error_message = "No such API found") {
    return 404 '{"source":"auth","message":"NotFound","code":"$auth_error_code","error":"$auth_error_message"}';
  }
  return 403 '{"source":"auth","message":"Forbidden","code":"$auth_error_code","error":"$auth_error_message"}';
}

location @auth_unavailable {
  internal;
  default_type application/json;
  access_log /dev/stdout auth_unavailable;
  return 503 '{"source":"auth","message":"Auth Service Unavailable","error":"Auth Service unreachable"}';
}

location @upstream_unavailable {
  internal;
  default_type application/json;
  access_log /dev/stdout upstream_unavailable;
  return 503 '{"source":"auth","message":"Upstream Service Unavailable","error":"Upstream Service unreachable"}';
}

Couple of things worth highlighting:

$auth_error_code and $auth_error_message were captured from the subrequest's X-Auth-Error-Code and X-Auth-Error-Message response headers. The Auth Service writes these on every deny, and NGINX surfaces them verbatim to the client.
The if inside @forbidden is how we handle the "endpoint not registered in the trie" case. The Auth Service signals it with a 403 + a specific message, and NGINX rewrites that to 404. The wire-level shape stays consistent, but the status reflects what the client should actually see.
Both fail-closed branches use a separate log format (auth_unavailable, upstream_unavailable). When something is on fire, you want it in its own log stream so dashboards aren't drowned by 200s.

Sharp edge 1: subrequests don't cache

People expect this to work:

proxy_cache auth_cache;
proxy_cache_key "$http_authorization";
auth_request /auth;

It doesn't. auth_request deliberately ignores proxy_cache — the subrequest fires every time. There's no built-in TTL on auth decisions.

Why is that the right default? Because auth decisions are not cacheable in the general case:

The same token might be revoked in the next 50 ms.
The required permissions for an endpoint might change.
The tenant context can change between requests (different X-Tenant-ID).

You can roll your own cache — for example, by keying off (token_hash, endpoint, method) and storing decisions in a shared cache — but you're now responsible for invalidating it when anything about the auth state changes. We chose a different approach: caching inside the Auth Service process itself. That's Chapter 8.

Sharp edge 2: `auth_request_set` is run in main-request context

This bit us on day three. Consider:

location /api/ {
  auth_request /auth;
  auth_request_set $identity_id $upstream_http_x_identity_id;
  proxy_pass http://api-service;
}

The variable $identity_id is not set when the subrequest returns. It's set when auth_request_set is evaluated in the main request — which happens after the subrequest, but $upstream_http_x_identity_id refers to the most recent upstream response in the current request context. Since the subrequest just finished, this works. But here's the trap:

location /api/ {
  auth_request /auth;
  proxy_pass http://api-service;
  # ❌ auth_request_set lives below proxy_pass
  auth_request_set $identity_id $upstream_http_x_identity_id;
}

auth_request_set directives are order-independent within a location (they apply at request setup), but if you start playing tricks with if or set-based conditionals, you can read $identity_id before auth_request_set evaluates and get an empty string. Lesson: keep auth_request_set together, immediately after auth_request, before any proxy_set_header.

Sharp edge 3: subrequest 5xx vs subrequest 401

A subtle one. If the Auth Service returns 401, the client sees 401. If the Auth Service returns 500, what does the client see?

By default: 500. Because auth_request propagates the subrequest's status if it's not 2xx and not 401/403.

That's almost never what you want. A 500 from the auth pod is "auth is broken," not "the user is broken." The client shouldn't see "internal server error" for what is operationally an auth outage.

Fix: explicit error_page mapping.

error_page 500 = @internal_server_error;
error_page 502 503 504 = @auth_unavailable;

Now any 5xx from the auth subrequest gets a clean envelope. We tell oncall via the alert pipeline (Chapter 9), not via the client.

Sharp edge 4: `proxy_intercept_errors`

Default is off, which is correct in our location blocks. We explicitly set it because we burned half a day on a related bug:

location /api/ {
  auth_request /auth;
  proxy_intercept_errors off;   # important
  proxy_pass http://api-service;
}

If you set proxy_intercept_errors on, NGINX will run upstream error responses (e.g., a 404 from the actual api-service) through your error_page mappings. Suddenly your "no such API found" 404 from the Auth Service and a "user not found" 404 from the upstream both end up in @forbidden's 404 branch. They look identical to the client. They're completely different problems.

Keep proxy_intercept_errors off on the upstream location. Let upstream errors pass through unmolested. Only auth-side errors should run through your named locations.

Sharp edge 5: NGINX never sees auth's body

The Auth Service can't return a JSON body that NGINX uses. Only the status code and response headers matter. If the Auth Service writes:

HTTP/1.1 401 Unauthorized
X-Auth-Error-Code: TOKEN_EXPIRED
X-Auth-Error-Message: token expired

{"detail":"token expired"}

…NGINX sees the 401 and the two X-Auth-* headers. The body is discarded. So the contract is:

Status decides allow/deny/error.
Response headers carry identity (on 200) or failure metadata (on 4xx).
Response body is for nobody. Don't bother.

Internalize this and the Auth Service handler design becomes much simpler — it's writing headers, not JSON.

What this directive bought us

To put it bluntly: auth_request is the difference between "we operate an Auth Gateway" and "we operate an Auth Library That Every Service Includes." It moved the decision point off every service's hot path and onto a single dedicated pod. Everything else in this series — endpoint classification, multi-tenant routing, revocation, observability — sits on top of that one primitive.

flowchart TD
    A[auth subrequest returns] --> B{status}
    B -->|200| C[continue to proxy_pass]
    B -->|401| D["@unauthorized<br/>return 401 JSON"]
    B -->|403| E["@forbidden<br/>404 if 'No such API found'<br/>else 403"]
    B -->|500| F["@internal_server_error<br/>return 500 JSON"]
    B -->|502/503/504| G["@auth_unavailable<br/>return 503 JSON<br/>(fail-closed)"]

What's next

Chapter 3 goes inside the Auth Service: the controller, the handler chain, JWT validation, the per-tenant RSA public-key cache, and the decision-reason model. We'll spend most of it in Go code.

If you implement an auth_request-backed gateway after reading this and a bit catches you, drop a comment. The five sharp edges above are the ones we hit. There are probably another five waiting for you.

I Dug Up My 10-Year-Old Android App, Dusted It Off With AI, and Put It Back on the Play Store

Akarshan Gandotra — Sun, 22 Feb 2026 06:44:45 +0000

Last week I did something I didn't expect to enjoy as much as I did — I resurrected a side project I hadn't touched in a decade.

Meet Drivelert: an anti-drowsiness app I built ages ago to help drivers stay alert on long trips. The Play Store had quietly pulled it down due to ancient target SDK, zero maintenance, the usual graveyard story. I'd moved on. The app hadn't.

Then, for reasons I can't fully explain (nostalgia? a slow weekend? some stubborn refusal to let past-me's work die?), I decided to bring it back.

What Even Was This Thing?

Drivelert was a simple but genuinely useful idea: monitor signs of driver fatigue and alert them before it becomes dangerous. I built it back when I was younger, more idealistic, and apparently not bothered by shipping something and completely abandoning it.

The code was... a time capsule. Deprecated APIs, patterns I wouldn't touch today, some choices that made me genuinely wince. But the idea was solid. The bones were good.

A decade later: smarter, more experienced, and AI finally made the revival real

This time around, I wasn't flying solo. I brought AI into the workflow for unpicking outdated patterns, modernizing chunks of logic, and accelerating the parts that would've taken me days to slog through manually.

It's a weird experience, honestly. You're reading code written by a younger version of yourself, and then having an AI help you translate it into something modern. Felt a bit like co-writing a letter to the past.

What made it work was the mix of experience and capability.
I had the wisdom and context of why things were built a certain way. AI brought the momentum and precision to rebuild them better.
That blend of hindsight, experience, and AI support turned out to be surprisingly powerful.

It's Live Again

After a week of evenings, Drivelert is back up on the Play Store:

👉 Drivelert on Google Play

There's something quietly satisfying about seeing an old project breathe again not just preserved, but actually improved. Younger-me would probably be pleased. Slightly jealous of the tools available now, but pleased.

What I Actually Learned

Old side projects aren't necessarily dead, sometimes they just need the right moment and a better toolkit.
AI assistance genuinely changes the calculus on revival projects. The "is this worth the effort?" math shifts when you can move faster.
Shipping something imperfect 10 years ago > never shipping the perfect version. Past-me understood this intuitively. I'd forgotten.

It's a second innings for a scrappy little app. Let's see how it goes. 🏏

Testing Redis Circuit Breaker with Toxiproxy

Akarshan Gandotra — Tue, 03 Feb 2026 07:28:46 +0000

Building resilient distributed systems requires thorough testing of failure scenarios. While unit tests are great for business logic, they can't simulate the complex network failures that happen in production. This is where Toxiproxy comes in—a powerful tool for testing how your application handles real-world network chaos.

In this tutorial, we'll explore how to test a Redis circuit breaker implementation using Toxiproxy to simulate various failure modes, from complete outages to subtle network degradation.

What is Toxiproxy?

Toxiproxy is a TCP proxy developed by Shopify that simulates network and system conditions for chaos and resiliency testing. Unlike traditional testing approaches that mock dependencies, Toxiproxy sits between your application and its dependencies, allowing you to inject real network failures:

Connection failures (service down)
Latency (slow networks)
Bandwidth limitations (network congestion)
Connection resets (unstable networks)
Data corruption (packet loss)

This makes it ideal for testing circuit breakers, retry logic, timeouts, and other resilience patterns.

Understanding Circuit Breakers

Before we dive into testing, let's briefly review the circuit breaker pattern. A circuit breaker prevents an application from repeatedly trying to execute an operation that's likely to fail, giving the failing service time to recover.

The circuit breaker has three states:

Closed: Normal operation, requests pass through
Open: Too many failures detected, requests are blocked or fail-fast
Half-Open: Testing if the service has recovered

For our Redis implementation, we'll configure the circuit breaker to:

Open after 5 consecutive failures
Either fail-open (allow requests through) or fail-closed (block requests)
Log when state transitions occur

Setup

Installing Toxiproxy

First, install Toxiproxy on your system:

macOS:

brew install toxiproxy

Starting the Toxiproxy Server

Start the Toxiproxy server in the background:

toxiproxy-server &

The server starts on localhost:8474 by default. You can verify it's running:

curl http://localhost:8474/version

Creating a Proxy for Redis

Now create a proxy that sits between your auth service and Redis:

# Create proxy: listens on 6380, forwards to Redis on 6379
toxiproxy-cli create redis-proxy \
  -l localhost:6380 \
  -u localhost:6379

This creates a proxy named redis-proxy that:

Listens on port 6380 (your application will connect here)
Forwards traffic to Redis on port 6379

Verify the proxy was created:

toxiproxy-cli list

Output:

Name        Listen          Upstream        Enabled
============================================================
redis-proxy localhost:6380  localhost:6379  true

Configuring Your Application

Point your auth service to use the Toxiproxy port instead of connecting directly to Redis:

export REDIS_HOST=localhost
export REDIS_PORT=6380  # Toxiproxy port, not 6379

# Restart your service

Now all Redis traffic flows through Toxiproxy, allowing you to inject failures without modifying your application code.

Testing Scenarios

Let's explore different failure scenarios and verify our circuit breaker behaves correctly.

Scenario 1: Simulate Redis Down (Circuit Opens)

The most critical test—what happens when Redis becomes completely unavailable?

Disable the proxy:

toxiproxy-cli toggle redis-proxy

This simulates Redis being down. Your application will start getting connection failures.

Expected Behavior:

First 4 requests fail but circuit remains closed
5th consecutive failure → Circuit breaker opens
Logs should show:

   {"level":"warn","msg":"Circuit breaker opened - Redis failures exceeded threshold","service":"auth"}

Subsequent requests are either:
- Fail-open: Allowed through (degraded mode, no Redis caching)
- Fail-closed: Rejected immediately (fail-fast)

Monitoring the circuit state:

While the circuit is open, check your application metrics or logs. The circuit should remain open for the configured recovery period.

Re-enable the proxy:

toxiproxy-cli toggle redis-proxy

After re-enabling, the circuit should transition to half-open on the next request, then back to closed if the request succeeds.

Verification checklist:

✅ Circuit opens after 5 failures
✅ Log message appears with correct timestamp
✅ Requests are handled according to fail-open/fail-closed policy
✅ Circuit recovers when service returns
✅ Application continues functioning (degraded or failed requests)

Scenario 2: Add Latency (Slow Redis)

Network latency is often more insidious than complete failures. It can cause timeouts, thread pool exhaustion, and cascading failures.

Add 5-second latency:

toxiproxy-cli toxic add redis-proxy \
  -t latency \
  -a latency=5000

This adds a 5000ms (5 second) delay to all requests passing through the proxy.

Expected Behavior:

If your Redis client timeout is less than 5 seconds (e.g., 2 seconds), requests will timeout and count as failures. After 5 consecutive timeouts, the circuit should open.

Logs you should see:

{"level":"error","msg":"Redis operation timeout","error":"context deadline exceeded"}
{"level":"warn","msg":"Circuit breaker opened - Redis failures exceeded threshold"}

Testing different latency levels:

# Moderate latency (1 second)
toxiproxy-cli toxic update redis-proxy \
  -n latency_downstream \
  -a latency=1000

# Extreme latency (10 seconds)
toxiproxy-cli toxic update redis-proxy \
  -n latency_downstream \
  -a latency=10000

Remove the latency toxic:

toxiproxy-cli toxic remove redis-proxy -n latency_downstream

Key insights:

Latency above your timeout threshold behaves like a failure
Helps verify your timeouts are properly configured
Tests thread pool behavior under slow dependencies

Scenario 3: Connection Reset (Network Errors)

Simulate unstable network connections that reset mid-request:

Add reset_peer toxic:

toxiproxy-cli toxic add redis-proxy \
  -t reset_peer \
  -a timeout=500

This closes the connection 500ms after receiving data, simulating network instability.

Expected Behavior:

Connections will be abruptly closed, causing errors like:

connection reset by peer
unexpected EOF
broken pipe

These should count as failures and eventually open the circuit.

Remove the toxic:

toxiproxy-cli toxic remove redis-proxy -n reset_peer_downstream

When to use this test:

Verifying connection pool recovery
Testing retry logic
Ensuring proper cleanup of broken connections

Scenario 4: Bandwidth Limit (Network Congestion)

Simulate network congestion with bandwidth restrictions:

Limit to 1KB/s:

toxiproxy-cli toxic add redis-proxy \
  -t bandwidth \
  -a rate=1

This restricts throughput to 1 kilobyte per second, simulating a severely congested network.

Expected Behavior:

Small Redis operations (GET, SET) might still work but slowly
Large operations (fetching big values, pipeline operations) will timeout
Gradual degradation rather than immediate failure

Test different bandwidth levels:

# Moderate congestion (10 KB/s)
toxiproxy-cli toxic update redis-proxy \
  -n bandwidth_downstream \
  -a rate=10

# Severe congestion (100 bytes/s)
toxiproxy-cli toxic update redis-proxy \
  -n bandwidth_downstream \
  -a rate=0.1

Remove the toxic:

toxiproxy-cli toxic remove redis-proxy -n bandwidth_downstream

What this tests:

Application behavior under sustained degradation
Whether your timeouts are appropriate for typical data sizes
If your circuit breaker is sensitive enough to detect slow failures

Scenario 5: Jitter (Variable Latency)

Real networks don't have consistent latency—they fluctuate. Simulate this with jitter:

Add latency with jitter:

toxiproxy-cli toxic add redis-proxy \
  -t latency \
  -a latency=1000 \
  -a jitter=500

This creates latency ranging from 500ms to 1500ms (1000ms ± 500ms).

Expected Behavior:

Some requests complete quickly
Others timeout randomly
Circuit breaker sees intermittent failures
Tests the circuit breaker's threshold logic

Why this matters:

More realistic than fixed latency
Reveals issues with retry timing
Tests how your system handles sporadic failures

Scenario 6: Slow Close (Graceful Degradation)

Simulate a service slowly dying:

Add slice toxic:

toxiproxy-cli toxic add redis-proxy \
  -t slice \
  -a average_size=64 \
  -a size_variation=32 \
  -a delay=100

This slices data into small chunks with delays between them.

Expected Behavior:

Operations take progressively longer
Eventually exceed timeouts
Allows testing gradual degradation vs sudden failure

Advanced Testing Patterns

Combining Multiple Toxics

You can apply multiple toxics simultaneously to simulate complex scenarios:

# Add both latency and packet loss
toxiproxy-cli toxic add redis-proxy -t latency -a latency=200
toxiproxy-cli toxic add redis-proxy -t slow_close -a delay=100

This simulates a network that's both slow AND unstable.

Automated Testing Script

Create a bash script to run your test scenarios automatically:

#!/bin/bash

echo "Starting Redis circuit breaker tests..."

# Test 1: Complete failure
echo "Test 1: Simulating Redis down"
toxiproxy-cli toggle redis-proxy
sleep 5
curl http://localhost:8080/health
toxiproxy-cli toggle redis-proxy

# Test 2: High latency
echo "Test 2: Adding high latency"
toxiproxy-cli toxic add redis-proxy -t latency -a latency=5000
sleep 5
curl http://localhost:8080/health
toxiproxy-cli toxic remove redis-proxy -n latency_downstream

# Test 3: Reset connections
echo "Test 3: Reset connections"
toxiproxy-cli toxic add redis-proxy -t reset_peer -a timeout=500
sleep 5
curl http://localhost:8080/health
toxiproxy-cli toxic remove redis-proxy -n reset_peer_downstream

echo "Tests complete!"

Key Metrics to Track

Circuit state: closed, open, half-open
Failure count: Number of consecutive failures
Success rate: Percentage of successful requests
Latency percentiles: p50, p95, p99
Request volume: Total requests during the test

Best Practices

1. Test All Failure Modes

Don't just test complete outages. Real production issues include:

Partial failures (some operations succeed, others fail)
Slow failures (latency-induced timeouts)
Intermittent failures (flaky networks)

2. Verify Recovery

Always test that your circuit breaker recovers properly:

# Cause failure
toxiproxy-cli toggle redis-proxy

# Wait for circuit to open
sleep 10

# Restore service
toxiproxy-cli toggle redis-proxy

# Verify recovery
curl http://localhost:8080/health

3. Test Under Load

Run toxics while your service is under load to see realistic behavior:

# Start load test
hey -z 60s -c 10 http://localhost:8080/api/endpoint &

# Inject failure mid-test
sleep 20
toxiproxy-cli toxic add redis-proxy -t latency -a latency=3000

4. Clean Up Between Tests

Always remove toxics and reset state:

# Remove all toxics
toxiproxy-cli toxic remove redis-proxy --all

# Or reset the entire proxy
toxiproxy-cli delete redis-proxy
toxiproxy-cli create redis-proxy -l localhost:6380 -u localhost:6379

Real-World Example

Here's a complete test scenario simulating a production incident:

#!/bin/bash

echo "=== Simulating Production Incident ==="

# Phase 1: Normal operation
echo "Phase 1: Normal operation (30s)"
sleep 30

# Phase 2: Redis starts slowing down
echo "Phase 2: Redis latency increases (1s → 3s)"
toxiproxy-cli toxic add redis-proxy -t latency -a latency=1000
sleep 10
toxiproxy-cli toxic update redis-proxy -n latency_downstream -a latency=3000
sleep 10

# Phase 3: Redis becomes unstable
echo "Phase 3: Connections start resetting"
toxiproxy-cli toxic add redis-proxy -t reset_peer -a timeout=1000
sleep 10

# Phase 4: Complete outage
echo "Phase 4: Redis goes down completely"
toxiproxy-cli toggle redis-proxy
sleep 20

# Phase 5: Redis recovers
echo "Phase 5: Redis comes back online"
toxiproxy-cli toggle redis-proxy
toxiproxy-cli toxic remove redis-proxy --all
sleep 20

echo "=== Incident simulation complete ==="
echo "Check logs and metrics to verify circuit breaker behavior"

Expected outcome:

Phase 1-2: Elevated latency, some timeouts
Phase 3: Circuit might start opening/closing intermittently
Phase 4: Circuit should open and stay open
Phase 5: Circuit should recover to closed state

Conclusion

Testing with Toxiproxy transforms abstract resilience patterns into concrete, verifiable behaviors. By simulating real network failures, you can:

Validate that your circuit breaker opens at the correct threshold
Verify that your application handles failures gracefully
Discover edge cases before they occur in production
Build confidence in your system's resilience

The key is to test not just complete failures, but the spectrum of degradation that happens in production: latency spikes, intermittent errors, bandwidth constraints, and gradual deterioration.

Remember: A circuit breaker that's never tested is just technical debt in disguise.

Resources

Have you used Toxiproxy for testing? What failure scenarios have you discovered that surprised you? Share your experiences in the comments! 💬

Forem: Akarshan Gandotra

Part 10 — Lessons learned building a Kubernetes Auth Gateway

What worked

auth_request as the primitive

Endpoint metadata as data

One structured log line per decision

Fail-closed by default at the edge

Caches in the auth process, not in NGINX

Pub/Sub-driven trie reload

The bitmap fast path

What hurt

Tenant resolution living in two places

The "first segment is the slug" rule

Migration to the bitmap took longer than expected

Cache invalidation was an afterthought

The default tenant we shipped on day one

Per-pod alert spam, twice

What we'd build earlier

1. The structured AUTH_DECISION log

2. Slack alert dedup helper

3. Fail-closed posture

4. Endpoint metadata in DB

5. Gap probe on revocation streams

6. Shadow harness for sensitive changes

7. Tenancy contract document

The maturity progression

Five pieces of advice for teams building this

1. Start with auth_request. Don't shop architectures.

2. Make the gateway HA before anything else.

3. The log is the API.

4. Cache invalidation has to be explicit.

5. Build observability before you build features.

What we'd build next

Final architecture

Closing

Part 9 — Operating the gateway: logs, traces, health, and degraded mode

The log line

The NGINX line

Two extra log formats for the fail paths

The Auth Service line

Joining NGINX and Auth Service

Tracing

Health probes

/livez — process is alive

/readyz — pod can take traffic

/healthz — deep health

Degraded mode

Redis is slow or down

Postgres is slow or down

Slack is slow or down

The alert tree

The Slack-alerter pattern

Rule 1: Atomic-bool dedup per state transition

Rule 2: Per-(pod × cause) dedup, not per-cause

Rule 3: Deployment-tag prefix

What we don't alert on

NGINX-specific operations

Graceful shutdown

Worker tuning

Upstream keepalive

A picture of where the time goes

What's next

Part 8 — Making It Fast: Caching, Hot Paths, and Avoiding DB Calls

The principle that shapes everything

Layer 1: JWT verify cache

Layer 2: RSA public key cache

Layer 3: route cache

Layer 4: the trie

Layer 5: policy bitmap snapshot

Layer 6: revocation map and SA version map

How all the invalidation channels fit together

The cache we got wrong

Cold start vs. warm

What to actually graph

Part 7 — Token Revocation Without Killing Performance

The constraints that ruled out the obvious answers

Two Redis structures, one job each

The startup problem — and two races hiding in the obvious solution

The hot path: deliberately boring

The gap probe: catching what the Stream misses

`auth_request` as the primitive

1. The structured `AUTH_DECISION` log

1. Start with `auth_request`. Don't shop architectures.

`/livez` — process is alive

`/readyz` — pod can take traffic

`/healthz` — deep health

What `auth_request` actually does

`internal;`

`proxy_method POST` + `proxy_pass_request_body off`

`proxy_pass_request_headers on`

`proxy_set_header X-Original-*`

`proxy_buffering off`

`proxy_http_version 1.1` + `Connection ""`

`error_page 502 503 504 = @auth_unavailable`

Pulling identity out: `auth_request_set`

Sharp edge 2: `auth_request_set` is run in main-request context