Forem: Gabriel Anhaia

Embeddings on the Edge: sentence-transformers vs Hosted APIs

Gabriel Anhaia — Tue, 05 May 2026 21:11:33 +0000

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I talked to last quarter was paying around eleven thousand dollars a month, by their account, to embed product reviews on text-embedding-3-small. Roughly two hundred million chunks, refreshed weekly. Their on-call engineer ran a spike on BGE-large-en-v1.5 with text-embeddings-inference on a single H100. He came back two days later. Same recall on their eval set, as he told it. Approximately seventy dollars a day in GPU time on a spot instance. The same week, a friend at a four-person startup did the opposite migration. Ripped out their self-hosted MiniLM container, moved everything to OpenAI, and watched the bill drop from about a thousand dollars in GPU time to ninety in tokens, by his numbers.

Both teams were right. The default answer for "local or hosted embeddings" is it depends on your scale, your data, and what your team already operates. True and useless. This post is the honest version of it depends.

What counts as local in 2026

Hosted: text-embedding-3-small, text-embedding-3-large, Voyage 3 / 3 Lite, Cohere Embed v4, Gemini Embedding 2. You POST text, you get a vector, you pay per million input tokens.

Local: sentence-transformers on top of an open backbone like BAAI/bge-large-en-v1.5, BAAI/bge-m3, nomic-ai/nomic-embed-text-v2, or the small workhorses (all-MiniLM-L6-v2, bge-small-en at 384 dims). For production you wrap it in text-embeddings-inference (TEI) or Baseten's BEI. vLLM also exposes an embedding mode. Some teams still hand-roll FastAPI plus an ONNX-quantised checkpoint. The model file is a download. The infra is yours.

Pricing snapshot, end of April 2026

Verified on 2026-04-29.

Provider / model	Cost per 1M input tokens	Dimensions	Notes
OpenAI `text-embedding-3-small`	$0.02	1536 (Matryoshka)	Batch tier 50% off
OpenAI `text-embedding-3-large`	$0.13	3072 (Matryoshka)	Batch tier 50% off
Voyage `voyage-3-large`	$0.12	1024	Tops retrieval-leaning MTEB
Voyage `voyage-3-lite`	$0.02	512	Direct competitor to `3-small`
Cohere Embed v4	$0.12	1536 / 256	Multimodal (text + image)
Self-hosted BGE-large	~$0 marginal	1024	You pay for the GPU

Sources: OpenAI new embedding models post, Voyage pricing, Cohere pricing, Google Gemini embedding pricing. Prices drop roughly every six months; the comparison shape stays.

On the MTEB leaderboard as of 2026-04-29, Voyage 3 Large leads the retrieval slice and NV-Embed-v2 leads the overall average. BGE-M3 is the strongest open multilingual option you can download today. all-MiniLM-L6-v2 is still serviceable for English-only retrieval at 384 dims, and it runs free on the CPU you already own — though it does not beat the frontier.

The crossover formula

The right comparison is total cost at your monthly volume, not "per token" against "per hour."

Hosted is direct: T million tokens per month at price p costs T * p. With text-embedding-3-small at $0.02 and a corpus producing 500M tokens of new chunks plus 50M of query traffic, you pay 550 * 0.02 = $11/month. Hosted wins at small and medium scale by a wide margin.

Local is the GPU plus the ops. A reasonable setup is one or two L4 instances for low-to-medium traffic, or an H100 (or A100) for high traffic. Rough estimates based on public AWS spot pricing as of 2026-04-29 (see Vantage's spot history for live numbers): an L4 lands around $0.40–$0.70/hour, an H100 around $2–$3/hour. Always-on monthly works out to roughly $400 for the L4 and $1,800 for the H100. Add SRE time, a second region, and 20% for when the spot pool runs dry.

A note before the math: the GPU numbers below are optimistic. Real production runs two replicas and autoscales on weekends; spot pool drains push you to on-demand during outages. Double the GPU side to stay honest, then read the crossover figures.

Crossover is the volume T_break where T_break * p = monthly_GPU_cost.

text-embedding-3-small at $0.02/1M tokens
  vs always-on L4 at ~$400/month:
  T_break = 400 / 0.02 = 20B tokens/month

text-embedding-3-large at $0.13/1M tokens
  vs always-on L4 at $400/month:
  T_break = 400 / 0.13 ≈ 3.1B tokens/month

voyage-3-large at $0.12/1M tokens
  vs always-on H100 at $1,800/month:
  T_break = 1800 / 0.12 = 15B tokens/month

You need billions of tokens per month before the GPU bill beats the cheapest hosted model. The crossover for text-embedding-3-large is roughly a quarter as high as the small one; better hosted models cut the local case faster than they cut the bill.

Hosted wins until you are pushing billions of tokens per month. Below that, you run your own embedder for latency, residency, or predictable cost — not for the dollar figure.

Where local actually wins

Four cases. Usually about something other than cost.

Latency-critical retrieval. A round trip to OpenAI from us-east-1 is typically 30–100 ms based on community latency reports, and slower the rest of the time on the kind of "degraded latency for some customers" days you see on every hosted-AI status page. A local all-MiniLM-L6-v2 on a CPU returns an embedding in roughly 5–15 ms for a single short sentence on a modern x86 CPU (figures vary heavily by sequence length and hardware). If retrieval lives in the user's typing path (autocomplete, instant suggestions, search-as-you-type), those 50 ms matter, and the variance hurts more than the median. Hermes IDE's local-context indexer hit this. The network round trip wrecked perceived snappiness even when the API was fast, so we landed on a small int8-quantised open model behind the editor.

Air-gapped or regulated data. Hospitals, banks, government tenants on air-gapped networks. Data does not leave the perimeter. Ask the vendor for a BAA and a private-link endpoint and wait six weeks for legal, or run BGE-large on a GPU you already own. A data-residency decision, not a cost one.

Very high steady-state throughput. Hundreds of millions of tokens per day on a mostly-static corpus. The GPU bill is fixed; the hosted bill scales linearly. Above the crossover, the GPU pays for itself in weeks. Search-engine builders, code-search products with tens of millions of files, and content platforms re-embedding nightly when the model drops a minor version all land here.

Predictable cost. A CFO can plan around a known monthly GPU bill. Token pricing is fine until the product goes viral and the next bill is several times the previous one. Local flattens the curve at the cost of provisioning headroom.

What looks like a local win but is not: "I want to fine-tune the embeddings." Fine-tune on top of text-embedding-3-large with a thin reranker against a curated triplets dataset. You do not need to own the embedder to own the relevance.

Where hosted wins (most of the time)

Most teams below the crossover should pick hosted, especially when there is no GPU ops budget. Niche queries are where this gets interesting: right now Voyage 3 Large and text-embedding-3-large both sit above almost any open model on retrieval-leaning MTEB tasks, and that gap matters when the use case lives in their lead. Multilingual queries are similar — if the team is too small to evaluate bge-m3 honestly against the hosted competitors, hosted is the safer default.

Hosted also wins on two ops dimensions most comparison tables miss. The model upgrade is somebody else's problem, and so is the recall regression test that comes with it. When OpenAI eventually ships the next embedding model, you get a config change. When the BGE team drops v2, you get a re-index plus the maintenance window that goes with it. That work is real.

A real serving setup

If you run local in production, the small end of "good" looks like one process on the GPU box, behind your HTTP stack, with a content-hashed cache in front.

import os
from typing import Sequence

import numpy as np
from sentence_transformers import SentenceTransformer

MODEL_NAME = os.environ.get(
    "EMBED_MODEL",
    "BAAI/bge-large-en-v1.5",
)

_model = SentenceTransformer(MODEL_NAME, device="cuda")


def embed(
    texts: Sequence[str],
    batch_size: int = 64,
) -> np.ndarray:
    return _model.encode(
        list(texts),
        batch_size=batch_size,
        normalize_embeddings=True,
        convert_to_numpy=True,
        show_progress_bar=False,
    )

normalize_embeddings=True matters because cosine similarity on un-normalized vectors is a footgun nobody catches in code review. batch_size=64 is a starting point because BGE-large saturates an H100 well below batch-256 and a too-large batch hurts online latency. device="cuda" because the default CPU path on a 1024-dim model is much slower than people expect.

For real production, swap this for text-embeddings-inference running the same checkpoint. TEI handles dynamic batching and CUDA-graph optimisation that hand-rolled sentence-transformers does not. Hugging Face's TEI benchmarks have reported approximately 450 req/s on bge-base-en-v1.5 on a single A10G at 512-token sequence length (TEI v1.x as of April 2026; check the repo for current numbers). Larger GPUs and dynamic batching push that further, but the exact multiplier depends on sequence length and batch settings, so measure on your own workload before you plan capacity off a published number.

A pragmatic decision tree

Starting today on a small or medium RAG corpus? Hosted. text-embedding-3-small or voyage-3-lite for cheap; text-embedding-3-large or voyage-3-large when retrieval quality matters more than spend. Cache against (model_name, model_version, content_hash) so re-embeds do not multiply the bill.

If retrieval lives in a user-facing latency budget under 100 ms end-to-end, prototype with all-MiniLM-L6-v2 on the application machine. Know the local baseline before you choose.

Data that cannot leave the network? The choice was made for you. Run BGE-M3 or Nomic Embed v2 on the GPU you already operate.

Monthly token budget in the billions on a mostly-static corpus? Bake-off: hosted text-embedding-3-large versus bge-large-en-v1.5 and bge-m3 on your own eval set. Whichever ships acceptable recall at the lower total cost wins. Cost has to include SRE time, not just the GPU.

Most teams skip the bake-off and pick by reflex. Local embeddings usually land within roughly 2–5% of the hosted frontier on English retrieval per the MTEB retrieval slice as of April 2026, and on niche domains the gap can flip. A weekend of evaluation is cheaper than a year on the wrong default.

If this is useful

The RAG Pocket Guide walks the retrieval stack end to end: chunking, embedding-model selection, index choice, reranking, and the eval discipline that makes any of this measurable. The embeddings chapter has the longer version of this post, including the bake-off harness, the dimension-truncation trade for Matryoshka models, and the patterns for swapping models without invalidating an index.

Building a Plugin System in Go Without `plugin`: 3 Patterns That Actually Ship

Gabriel Anhaia — Tue, 05 May 2026 21:11:10 +0000

Book: Hexagonal Architecture in Go
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I know of shipped a release where one line in the
notes mattered more than the rest: "now loads auth checks
as Go plugins." They had reached for the stdlib plugin
package because the docs are right there, the API is one
function call, and the demo in every blog post makes it
look easy.

The rollout did not go well. A plugin built on one machine
refused to load on hosts running a slightly different Go
patch version, and the error was a hash mismatch with no
graceful fallback. They ended up deleting the plugin code
and shipping a hot-reload-disabled build to recover.

Go's stdlib plugin package is the wrong default for almost
every team that reaches for it. It only works on Linux,
macOS, and FreeBSD. The plugin and the host must be built
with the exact same Go toolchain, module versions, and build
tags. Windows is not supported at all. There is no
sandboxing, and a panic in a plugin takes down the host. The
official docs say the package is "currently known to have a
number of issues". That has been
the warning for years.

You almost never want it. What you want is one of the
patterns below — pick the one that matches your isolation
and language story.

A common interface, three implementations

Every example below implements the same Hook contract. The
host has one job: ask a hook whether a request is authorised.

// hook/hook.go - shared by host and every plugin pattern.
package hook

import "context"

type AuthRequest struct {
    UserID   string
    Resource string
    Action   string
}

type AuthDecision struct {
    Allow  bool
    Reason string
}

type Hook interface {
    CheckAuth(
        ctx context.Context, r AuthRequest,
    ) (AuthDecision, error)
}

That interface is the contract. Every pattern below honours
it. What changes is where the implementation lives and
how the host talks to it.

Pattern 1: compile-time registration

The simplest plugin system is no plugin system. You define
the interface, every implementation lives in its own
package, and a tiny registry collects them at process start
via init(). No dynamic loading, no version skew, no
runtime surprise.

// hook/registry.go
package hook

import "fmt"

var registry = map[string]Hook{}

func Register(name string, h Hook) {
    if _, exists := registry[name]; exists {
        panic("hook already registered: " + name)
    }
    registry[name] = h
}

func Get(name string) (Hook, error) {
    h, ok := registry[name]
    if !ok {
        return nil, fmt.Errorf(
            "hook %q not registered", name)
    }
    return h, nil
}

A hook implementation registers itself.

// hook/checks/admin/admin.go
package admin

import (
    "context"
    "myapp/hook"
)

type adminOnly struct{}

func (adminOnly) CheckAuth(
    _ context.Context, r hook.AuthRequest,
) (hook.AuthDecision, error) {
    if r.UserID == "admin" {
        return hook.AuthDecision{Allow: true}, nil
    }
    return hook.AuthDecision{
        Allow:  false,
        Reason: "admin-only resource",
    }, nil
}

func init() {
    hook.Register("admin-only", adminOnly{})
}

The host imports the package for its side effect:

import _ "myapp/hook/checks/admin"

That underscore import wires the hook in. To enable a hook
in a build, you add the import. To disable it, you remove
the import. To ship a different feature flag set, you
maintain build tags or separate main packages that pull in
different subsets.

What this gives you: zero call overhead, full type safety,
no version drift, easy debugging. What it costs you: every
hook ships in the same binary, every change requires a
redeploy, and untrusted code is a non-starter because there
is no isolation.

My read is this pattern covers most real plugin needs in
Go. Most teams that reached for plugin actually wanted
this one.

Pattern 2: subprocesses over gRPC

When you need real isolation between the host and the
extension, run it in another process and talk to it over
RPC. Maybe a panic in the extension must not kill the host.
Or the extension is third-party code you don't trust to
share an address space with. Or it has a lifecycle of its
own and needs to start, stop, and restart independently.
The fix is the same: get it out of your address space.

This is the model HashiCorp built go-plugin around, and it
is the system that keeps Terraform, Nomad, Vault, and
Waypoint extensible across hundreds of provider binaries
(repo). The
library handles handshake, version negotiation, transport,
and graceful shutdown. The transport is gRPC, which means
plugins can be written in any language with a gRPC stack.

A minimal hook plugin looks like this. The proto definition
maps the Hook interface line by line:

syntax = "proto3";
package hookpb;

service Hook {
  rpc CheckAuth (AuthRequest) returns (AuthDecision);
}

message AuthRequest {
  string user_id = 1;
  string resource = 2;
  string action = 3;
}

message AuthDecision {
  bool allow = 1;
  string reason = 2;
}

The proto compiles into the Go module path myapp/hookpb
via your protoc invocation. The plugin process then
registers a gRPC server that implements the service:

// plugin-admin/main.go
package main

import (
    "context"

    "github.com/hashicorp/go-plugin"
    "google.golang.org/grpc"

    pb "myapp/hookpb"
)

type adminServer struct {
    pb.UnimplementedHookServer
}

func (s *adminServer) CheckAuth(
    _ context.Context, r *pb.AuthRequest,
) (*pb.AuthDecision, error) {
    return &pb.AuthDecision{
        Allow:  r.UserId == "admin",
        Reason: "admin-only resource",
    }, nil
}

type hookGRPC struct {
    plugin.NetRPCUnsupportedPlugin
}

func (hookGRPC) GRPCServer(
    _ *plugin.GRPCBroker, s *grpc.Server,
) error {
    pb.RegisterHookServer(s, &adminServer{})
    return nil
}

func main() {
    plugin.Serve(&plugin.ServeConfig{
        HandshakeConfig: plugin.HandshakeConfig{
            ProtocolVersion:  1,
            MagicCookieKey:   "HOOK_PLUGIN",
            MagicCookieValue: "v1",
        },
        Plugins: map[string]plugin.Plugin{
            "hook": hookGRPC{},
        },
        GRPCServer: plugin.DefaultGRPCServer,
    })
}

The host launches the plugin binary, the library handshakes
on stdin/stdout, then handshakes off to a gRPC connection on
a Unix socket or local TCP port. From that point the host
calls CheckAuth like any other gRPC method.

The win is process isolation: a panicking plugin takes down
only itself, and the host just sees a dropped connection it
can recover from. You also get language independence (the
plugin can be written in any language with a gRPC stack),
independent deployment (drop a new binary in plugins/),
and a debuggability story where you can attach a debugger to
the plugin process directly. The cost is an IPC hop and a
serialisation round trip on every call, plus the operational
tax of supervising extra processes. Expect latency in the
hundreds of microseconds to low milliseconds on the same
host — fine for control-plane work, too slow for hot loops.
Measure for your workload.

One trap to avoid: do not use go-plugin's net/rpc mode
for new systems. It is gob-encoded and Go-only, which kills
the cross-language story that's half the reason to reach for
this pattern in the first place. gRPC mode is what HashiCorp
recommends today.

Pattern 3: WebAssembly via wazero

The third pattern is what you reach for when you want
sandboxing on top of polyglot support. The plugin is a
WebAssembly module. The host runs it in an in-process
runtime. The plugin cannot touch the file system, the
network, or the host's memory unless the host explicitly
hands it a capability.

wazero is the runtime to use. It is a
pure-Go WebAssembly runtime with zero dependencies and no
CGO, which means cross-compilation still works and your
deployment story stays simple. It is WASI-Preview-1
compatible and Wasm Core 1.0/2.0 compliant.

The host instantiates the runtime once, compiles the plugin
module, and invokes exported functions:

// host/wasm.go
package host

import (
    "context"
    "encoding/json"
    "os"

    "github.com/tetratelabs/wazero"
    "github.com/tetratelabs/wazero/api"
    wasi "github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"

    "myapp/hook"
)

type WasmHook struct {
    runtime  wazero.Runtime
    module   api.Module
    checkFn  api.Function
    memory   api.Memory
    mallocFn api.Function
}

func LoadWasmHook(
    ctx context.Context, path string,
) (*WasmHook, error) {
    rt := wazero.NewRuntime(ctx)
    wasi.MustInstantiate(ctx, rt)

    bin, err := os.ReadFile(path)
    if err != nil {
        return nil, err
    }
    mod, err := rt.Instantiate(ctx, bin)
    if err != nil {
        return nil, err
    }
    return &WasmHook{
        runtime:  rt,
        module:   mod,
        checkFn:  mod.ExportedFunction("check_auth"),
        memory:   mod.Memory(),
        mallocFn: mod.ExportedFunction("malloc"),
    }, nil
}

// writeBytes calls the guest's malloc, copies `in` into
// guest memory, and returns the guest pointer. readResult
// reads a length-prefixed byte slice back out at `res[0]`.
// Both are short helpers (~10 lines each) — omitted here
// for brevity; see the wazero examples for the canonical
// shape.
func (h *WasmHook) CheckAuth(
    ctx context.Context, r hook.AuthRequest,
) (hook.AuthDecision, error) {
    in, _ := json.Marshal(r)
    ptr, err := writeBytes(ctx, h, in)
    if err != nil {
        return hook.AuthDecision{}, err
    }
    res, err := h.checkFn.Call(
        ctx, ptr, uint64(len(in)))
    if err != nil {
        return hook.AuthDecision{}, err
    }
    out := readResult(h.memory, res[0])
    var dec hook.AuthDecision
    if err := json.Unmarshal(out, &dec); err != nil {
        return hook.AuthDecision{}, err
    }
    return dec, nil
}

The plugin side can be written in any language that
compiles to Wasm. Rust, TinyGo, AssemblyScript, Zig all
work. A TinyGo version of the same admin-only hook is under
30 lines and compiles to a .wasm file under 50 KB.

You get hot-reload almost for free: drop a new .wasm in,
re-instantiate, no host restart. You get sandboxing — the
plugin sees nothing the host does not explicitly pass it.
You get language independence and a security model you can
actually reason about. The price is serialisation across
the host/wasm boundary, a slower cold start than native
code, and a debugging story that is still maturing.
CPU-bound work inside Wasm runs measurably slower than
native Go; the gap depends heavily on workload and on
which Wasm runtime you choose. Benchmark your hot path
before committing.

How to pick

The decision tree is short.

All hooks are first-party, you control every binary, and you redeploy when hooks change. That's compile-time registration. Boring, right answer.
Hooks are third-party, written in any language, or need process isolation because crashes must not propagate. That's go-plugin over gRPC.
Hooks come from untrusted sources (customers, marketplace, user-uploaded scripts) and you need real sandboxing on top of polyglot support. That's wazero.

The stdlib plugin package fits none of those well. The
restrictions on toolchain match, OS support, and lack of
sandboxing rule it out for production work. Treat it as a
lab toy. Don't ship it.

If this was useful

The longer version of this argument, including how the same
boundary question shows up in repository design, where the
application service belongs, and how to keep transports
swappable without polluting the core, is most of
Hexagonal Architecture in Go. The Complete Guide to Go
Programming covers the language-level pieces (interfaces,
init order, package boundaries) the patterns above lean on.

Stage 3 vs Legacy TypeScript Decorators in a NestJS App

Gabriel Anhaia — Tue, 05 May 2026 21:10:36 +0000

Book: The TypeScript Type System — From Generics to DSL-Level Types
Also by me: The TypeScript Library — the 5-book collection
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A backend engineer on a team I work with opened a TypeScript 5.4
upgrade PR for a NestJS service. The build was green. Tests passed.
The container booted in staging the way it always did. Then someone
deleted experimentalDecorators from tsconfig.json because the
flag looked deprecated in their editor's tsconfig hint, and the
same service stopped resolving @Body() parameters, validators
silently passed, and the DI container booted with half the
providers it expected.

The fix was one line. Put the flag back.

The reason it broke is the thing this post is about. TypeScript
shipped two different decorator features under the same syntax,
they have incompatible call shapes, and the framework on your
desk almost certainly depends on the older one. The newer design
is better, but it does not drop in, and the migration window is
years long.

Two features, one syntax

Decorator syntax (the @something above a class, method, or
field) has been in TypeScript since before it was an ECMAScript
proposal. The original implementation hid behind the
experimentalDecorators flag and matched what was, at the time, a
TC39 stage-2 design. Frameworks built against it. Angular, NestJS,
TypeORM, class-validator, type-graphql: every one of them assumed
that shape and stayed there.

Then TC39 advanced a different design to stage 3.
That is the one that landed in TypeScript 5.0 in March 2023, with
no flag required. The syntax above the class looks identical. The
function the engine calls underneath is not.

The current state, verified for this post:

TC39 decorators sit at stage 3 of the proposal pipeline, with a few remaining advancement requirements before stage 4. The proposal is stable enough that browsers, Node, and TypeScript itself implement it as-is.
TypeScript supports both flavors. Without experimentalDecorators, the compiler accepts and emits the stage-3 form. With it, you get the legacy stage-2 form. The flag has not been marked for removal in the TypeScript release notes; the ecosystem still depends on it.
emitDecoratorMetadata is legacy-only. The new decorators do not emit design:type, design:paramtypes, or design:returntype metadata. There is a separate, smaller TC39 metadata proposal that gives stage-3 decorators a Symbol.metadata hook, but it does not reproduce the type-reflection story.
NestJS 10 and 11, the versions shipping in May 2026, require the legacy decorators plus reflect-metadata. Switching the flag off in a Nest project breaks DI resolution, route binding, pipes, and class-validator. As of this writing, the framework has not shipped a stage-3-aligned major.

The legacy signature

A legacy decorator receives positional arguments that depend on
where you put it. A method decorator gets the prototype, the
property name, and the property descriptor. A class decorator gets
the constructor. A property decorator gets the prototype and the
name.

function Log(
  target: object,
  propertyKey: string,
  descriptor: PropertyDescriptor,
): PropertyDescriptor {
  const original = descriptor.value;
  descriptor.value = function (...args: unknown[]) {
    console.info(`${propertyKey} called`);
    return original.apply(this, args);
  };
  return descriptor;
}

class OrdersService {
  @Log
  place(orderId: string): void {
    // ...
  }
}

The decorator mutates the descriptor in place, returns a new one,
or both. To make this compile you need experimentalDecorators: true.
For Nest's @Body(), @Inject(), and the like to work, you also
need emitDecoratorMetadata: true plus an
import "reflect-metadata" at the top of your entry file. The
compiler then emits hidden Reflect.metadata("design:paramtypes", [...])
calls next to every decorated declaration. That metadata is what
Nest reads to figure out which class to inject into a constructor
parameter.

The stage-3 signature

A stage-3 decorator is a function with a fixed shape: (value, context) => value | void.
What value is depends on the kind: the function for methods,
the constructor for classes, a { get, set } record for accessors,
and undefined for fields (where you return an initializer).

function log<This, Args extends unknown[], Return>(
  value: (this: This, ...args: Args) => Return,
  context: ClassMethodDecoratorContext<
    This,
    (this: This, ...args: Args) => Return
  >,
): (this: This, ...args: Args) => Return {
  const name = String(context.name);
  return function (this: This, ...args: Args): Return {
    console.info(`${name} called`);
    return value.apply(this, args);
  };
}

class OrdersService {
  @log
  place(orderId: string): void {
    // ...
  }
}

Three things have changed.

The descriptor is gone. You are handed the function directly and
you return its replacement. No Object.defineProperty dance.

The context object replaces the positional arguments. It carries
kind, name, static, private, an access object with get
and set callbacks, and an addInitializer hook for registering
work that should run when the instance is constructed. Each
decorator kind gets its own context type: ClassDecoratorContext,
ClassMethodDecoratorContext, ClassFieldDecoratorContext,
ClassAccessorDecoratorContext, ClassGetterDecoratorContext,
ClassSetterDecoratorContext.

Parameter decorators do not exist in stage 3. There is no shape for
"decorate the second argument of this method." Nest's @Body(),
@Param(), @Query() all rely on parameter decorators in the
legacy form. That is the load-bearing reason a Nest migration is
not a tsconfig flip.

Three real ports: logging, validation, DI

Logging maps cleanly between the two. The version above shows it
both ways.

Validation is where the metadata gap shows up. A class-validator
style legacy decorator does this:

import "reflect-metadata";

const VALIDATORS = Symbol("validators");

function MinLength(n: number) {
  return function (target: object, propertyKey: string): void {
    const existing: Map<string, Array<(v: unknown) => string | null>>
      = Reflect.getOwnMetadata(VALIDATORS, target.constructor)
        ?? new Map();
    const list = existing.get(propertyKey) ?? [];
    list.push((v) =>
      typeof v === "string" && v.length >= n
        ? null
        : `must be at least ${n} chars`,
    );
    existing.set(propertyKey, list);
    Reflect.defineMetadata(VALIDATORS, existing, target.constructor);
  };
}

class CreateUserDto {
  @MinLength(3)
  username!: string;
}

Reflect.defineMetadata and Reflect.getOwnMetadata come from the
reflect-metadata polyfill. They write to a global WeakMap keyed
by the target. Anything that wants to read the validators later
imports reflect-metadata first and asks for the same key.

The stage-3 version uses context.metadata instead, which is a
plain object the decorator owns:

const VALIDATORS = Symbol("validators");

type Validator = (v: unknown) => string | null;
type ValidatorMap = Map<string | symbol, Validator[]>;

function minLength(n: number) {
  return function (
    _value: undefined,
    context: ClassFieldDecoratorContext,
  ): void {
    const meta = context.metadata as {
      [VALIDATORS]?: ValidatorMap;
    };
    const map: ValidatorMap = meta[VALIDATORS] ?? new Map();
    const list = map.get(context.name) ?? [];
    list.push((v) =>
      typeof v === "string" && v.length >= n
        ? null
        : `must be at least ${n} chars`,
    );
    map.set(context.name, list);
    meta[VALIDATORS] = map;
  };
}

class CreateUserDto {
  @minLength(3)
  accessor username: string = "";
}

Two things to notice. The decorator is a field decorator, so the
target is accessor username rather than username. Stage-3 field
decorators run before the field exists, so a class wanting
mutable, decorated public state usually picks accessor. And the
metadata bag is per-class, attached to the class via Symbol.metadata,
no global polyfill needed. A consumer reads it through a typed
accessor:

type WithMetadata = {
  [Symbol.metadata]?: { [VALIDATORS]?: ValidatorMap };
};

const validators =
  (CreateUserDto as WithMetadata)[Symbol.metadata]?.[VALIDATORS];

DI registration is the third case. The legacy pattern Nest uses is
the one decorators were originally designed for: @Injectable()
flags the class, the constructor's parameter decorators name what
to inject, and emitDecoratorMetadata writes the parameter types
so the container can resolve them.

import "reflect-metadata";

@Injectable()
class OrdersService {
  constructor(
    private readonly db: Database,
    private readonly clock: Clock,
  ) {}
}

There is no straight stage-3 equivalent. Stage-3 has no parameter
decorators and no parameter type metadata. The honest answer for
DI in a stage-3 world is one of three patterns:

Replace constructor parameter injection with a token-based factory: the decorator marks the class, and a separate register call wires up tokens explicitly. Angular has been moving in this direction with inject(). The decorator does less; the call site does more.
Use a stage-3 class decorator together with addInitializer to register the class with a container at construction time.
Keep reflect-metadata around explicitly, even on stage 3, for the parameter-types story alone, accepting that this is the seam the new proposal cuts off.

There is no version of this where you keep the Nest call shape and
also delete experimentalDecorators and reflect-metadata. The
proposal is a different thing. The framework will move when it
moves.

The NestJS migration map

If you maintain a Nest app today, here is the actual decision tree.

Do nothing yet. The legacy decorators are not going away from
TypeScript any time soon. The flag is supported, the
emitDecoratorMetadata story works, and Nest's whole stack
depends on both. Anyone advising you to flip the flag on a live
Nest service will learn that the hard way.

Inside the Nest app, do not write stage-3 decorators yourself.
You will end up with a project that has experimentalDecorators: true
in tsconfig and your hand-rolled stage-3 decorator silently being
compiled as a legacy one. The semantics are different enough that
it will mostly look like it works and then misbehave on
inheritance, on static members, or when something throws inside
an initializer.

For greenfield TypeScript libraries that do not need parameter
decorators or emitDecoratorMetadata, prefer stage 3. The
ergonomics are better, the spec is real, and the polyfill
footprint is smaller. Logging, observability wrappers,
serialization metadata, validation, feature-flag gates: all of
these map cleanly to stage 3.

Watch the Nest roadmap. The framework has acknowledged the
direction. When a stage-3-aligned major lands, the migration is
going to involve at minimum: rewriting parameter decorators as
either pipes or token-based injection, replacing
emitDecoratorMetadata-driven type discovery with explicit
tokens, and porting every custom decorator your codebase has
accumulated. Plan a quarter, not an afternoon.

When a TypeScript flag is described as "deprecated" by your editor
and the project is built on Nest, Angular, TypeORM, or
class-validator, that hint is wrong for your case. Read the call
site, not the linter.

If this was useful

With decorators, the type signature tells you more than the
framework docs. The TypeScript Type System spends most of its
pages on exactly that habit: generics, conditional types, infer,
branded types, and the patterns that let you describe what a
decorator, a builder, or a query DSL does at the type level before
you commit to an implementation. If the stage-3 signatures above
made sense and you want to write your own decorators with the
type ergonomics spelled out, that is the book to read next.

For everything that feeds into reading TypeScript signatures
fluently (narrowing, modules, async, the day-to-day machinery),
TypeScript Essentials is the entry point. TypeScript in
Production picks up where this post ends, with the tsconfig,
build, and library-authoring decisions that make a feature like
stage-3 decorators portable across runtimes. If you are coming
from JVM, Kotlin and Java to TypeScript makes the bridge; from
PHP 8+, PHP to TypeScript covers the same ground from the other
side.

The five-book set:

TypeScript Essentials — From Working Developer to Confident TS, Across Node, Bun, Deno, and the Browser — entry point: amazon.com/dp/B0GZB7QRW3
The TypeScript Type System — From Generics to DSL-Level Types — deep dive: amazon.com/dp/B0GZB86QYW
Kotlin and Java to TypeScript — A Bridge for JVM Developers — bridge for JVM devs: amazon.com/dp/B0GZB2333H
PHP to TypeScript — A Bridge for Modern PHP 8+ Developers — bridge for PHP devs: amazon.com/dp/B0GZBD5HMF
TypeScript in Production — Tooling, Build, and Library Authoring Across Runtimes — production layer: amazon.com/dp/B0GZB7F471

All five books ship in ebook, paperback, and hardcover.

Generic-Heavy APIs Hurt Compile Times. Here Is How to Measure It

Gabriel Anhaia — Tue, 05 May 2026 21:09:48 +0000

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You added a tiny functional helpers package to the repo last
quarter. Map, Filter, Reduce, GroupBy. Twelve lines
each, all generic. The team loved it. Six months later the CI
job that used to finish in a handful of minutes is now taking
nearly twice that. Nobody admits to a culprit because nobody
changed anything obvious. You start bisecting and the line
you land on is the import of that helpers package in the
third service that pulled it in.

Generics in Go are not free. They look free at the call site,
which is the whole point. The bill arrives later, in go build
wall-clock time and in the size of your release binaries. If
you ship a library that is generic in many places and is
imported into many packages, the bill grows with both numbers.

The fix is usually not to delete the generic. It is to put a
non-generic boundary between the generic and the rest of the
code, so the compiler stops paying for the type parameter at
every import site.

The shape Go's compiler picks

Go does not do full monomorphisation the way C++ and Rust do.
It does not give every distinct instantiation its own machine
code. It also does not do full dictionary passing the way OCaml
or some Haskell implementations do. It picks a hybrid that the
proposal calls GC-shape stenciling with dictionaries. The
design doc lives in the proposal repo at
generics-implementation-gcshape.md
and is the source of truth for what the compiler actually
does.

The short version. The compiler groups instantiations by their
GC shape. Two types share a GC shape if they have the same
size, alignment, and pointer layout. int32 and int32
obviously share. *Order and *Invoice share, because both
are single-word pointers as far as the GC is concerned. A
struct with two ints and a struct with one int and a float32
of the same width also share, if the pointer maps line up.

For each GC shape used by any instantiation in your program,
the compiler emits one chunk of code. It also emits a
dictionary per concrete instantiation, which carries the
information the shared code body cannot see: the type
descriptor, method tables for type-asserting calls, and so on.

Two consequences fall out of this. First, instantiating
Filter[int] and Filter[int32] produces two stencils,
because they are different shapes (size 8 vs 4 on most
platforms). Second, instantiating Filter[*Order] and
Filter[*Invoice] shares one stencil but two dictionaries.
The shared body is small. The per-shape body is not. If your
helpers package is generic over five or six shapes (int sizes,
strings, pointer-to-struct, struct-of-two-pointers,
struct-of-one-pointer-and-one-non-pointer) and is imported
into thirty packages, the compiler is doing real work.

Measuring the actual cost

Two numbers matter, and they are easy to read. Compile time
and binary size. Run a clean build with the timer on:

go clean -cache && time go build ./...

Then a stripped-binary size:

go build -ldflags='-w -s' -o /tmp/svc ./cmd/svc
ls -l /tmp/svc

For the per-instantiation breakdown, the tool you want is
go tool objdump. Build with -gcflags='-m=2' first to see
what got inlined and what did not:

go build -gcflags='-m=2' ./... 2> build.log
grep -E '^.*: inlining call to|^.*: cannot inline' build.log

Then disassemble and grep for the generic function name:

go tool objdump -s 'pkg/helpers.Filter' /tmp/svc | head -40

Each distinct function symbol that comes back is a stencil the
compiler had to emit. If you see one symbol, you got the share.
If you see five, you got five stencils. Multiply that by the
helper functions in your library and you have a number you
can defend in code review.

A tiny benchmark module lets you watch the compile time move
in isolation:

package helpers

func Map[T any, U any](xs []T, f func(T) U) []U {
    out := make([]U, len(xs))
    for i, x := range xs {
        out[i] = f(x)
    }
    return out
}

func Filter[T any](xs []T, p func(T) bool) []T {
    out := make([]T, 0, len(xs))
    for _, x := range xs {
        if p(x) {
            out = append(out, x)
        }
    }
    return out
}

func Reduce[T any, A any](
    xs []T, seed A, f func(A, T) A,
) A {
    a := seed
    for _, x := range xs {
        a = f(a, x)
    }
    return a
}

Add a _test.go that calls each helper with a different
concrete type per test. Then build the test binary:

go clean -cache
time go test -c -o /tmp/helpers.test ./helpers
ls -l /tmp/helpers.test
go tool objdump -s 'helpers.(Filter|Map|Reduce)' \
    /tmp/helpers.test | grep '^TEXT' | wc -l

Illustrative numbers from a synthetic benchmark on a local
machine (not a peer-reviewed measurement — yours will differ
in absolute terms; the shape of the change is what matters):

go test -c  3.42s user 0.61s system 192% cpu 2.094 total
-rwxr-xr-x  1 user  staff  3884240 /tmp/helpers.test
12

Twelve TEXT entries for three generic helpers means four
shapes in use, each emitting its three helpers. Rerun after
adding a fourth concrete shape in the tests and the count
climbs in step. That is the bill.

Pattern 1: do not propagate the parameter past the boundary

Most generic-heavy libraries land in one of two designs. The
first is a top-level helper that takes [T any] and never
itself stores anything generic. It does work and returns. The
second is a generic container that survives across calls,
holds state, and exposes methods. The second is more expensive
because every method on the container is itself generic and
re-stencilled per shape.

The fix for the second one is an interface boundary that does
not carry the type parameter. The cache below is generic in
the value type, but readers and writers cross a non-generic
seam:

type cache[V any] struct {
    m map[string]V
}

func (c *cache[V]) Get(k string) (V, bool) {
    v, ok := c.m[k]
    return v, ok
}

func (c *cache[V]) Put(k string, v V) {
    c.m[k] = v
}

type AnyCache interface {
    GetAny(k string) (any, bool)
}

Internal callers still use cache[V] and pay nothing for
the boxing. External callers (observability, debugging, an
admin endpoint, a test harness) go through AnyCache and the
compiler stops emitting per-shape code at every one of those
import sites. The price is a single any at the seam, paid
once on the way out.

This is the same trick the standard library uses around
encoding/json. The decoder is concrete; the value comes back
as any because the wire shape is genuinely open. You are
applying the same pattern internally, on a much smaller scale,
to keep compile time linear in your usage.

Pattern 2: constrain to one concrete type when the generic is theatre

Half of the generics in real codebases are used at exactly one
type. Repository[Order]. Service[User]. Handler[Request].
The square brackets are there because somebody anticipated
future polymorphism that never showed up. The compiler still
stencils. The reader still pays the cognitive overhead.

type Repository[T any] struct {
    db *sql.DB
}

func (r *Repository[T]) FindByID(id int) (T, error) {
    var zero T
    return zero, nil
}

If T is Order everywhere and only Order, drop the type
parameter and write the concrete version. The diff is small.
The compile time is smaller. The next reader does not have to
trace the parameter through three layers of code to confirm
that yes, it really is always Order:

type OrderRepository struct {
    db *sql.DB
}

func (r *OrderRepository) FindByID(id int) (Order, error) {
    return Order{}, nil
}

The audit for this one is a short pipeline. Find every use
site of the generic type, strip the type-parameter declaration
lines ([T any], [T comparable], etc.), and see how many
distinct concrete instantiations remain:

git grep -nE '\bRepository\[' -- '*.go' | \
    awk -F'[][]' '{print $2}' | \
    grep -vE '\b(any|comparable)\b' | \
    sort -u

If the output is one line, the generic is theatre. Delete it.
If it is two lines and the second is a test stub, the generic
is theatre. Delete it.

Pattern 3: use `any` deliberately at known-broad seams

The careful exception. There are seams in a service where you
genuinely do not know the type and pretending otherwise costs
more than it saves. The wire boundary into a JSON service. A
generic logging sink. An event bus that carries dozens of
event shapes nobody wants to enumerate as a type set.

At those seams, any is the honest answer. The compiler emits
one path. The runtime pays a single boxing per call. You
recover the type with a type assertion or a slog-style
attribute set, and you do not pay per-shape stencilling on the
way through.

The mistake is using any inside a generic. The mistake is
also using a generic at a wire seam. Both are the same bug
in opposite directions: the type parameter and the empty
interface are tools for different problems and they fight each
other when you mix them.

A working rule. If the function knows what T is, write the
generic and constrain it tightly. If the function is on a
boundary where the type is genuinely open, take any and
move on. Do not let a generic API leak across a boundary just
because the call site can spell [Order].

What the bill looks like after

Run the same time go build ./... and objdump | grep TEXT | wc -l before and after adding the non-generic seam. On the
synthetic benchmark above, dropping from four shapes to one
shape across the import graph cut the stencil count from 12 to
3 and shaved roughly 10–25% off the cold-build wall clock for
the test binary. Workload-specific — the only number that
matters is the diff between your two runs.

If this was useful

Compile time, binary size, and the way the runtime ferries
type information are the kind of thing that go from invisible
to load-bearing the moment a service grows past a single
binary. The Complete Guide to Go Programming covers the
generics implementation, the constraint vocabulary, and the
escape-analysis behaviour that interacts with generics in
enough detail that you can reason about a time go build
regression without bisecting commits at random. Hexagonal
Architecture in Go is the design-side counterpart: how to
draw the seams in a service so the generic helpers stay in the
inner hexagon and the boundaries stay narrow, which is the
same trick this post recommends scaled to whole modules.

Mocking ESM in 2026: Vitest, Bun, and Node's mock.module

Gabriel Anhaia — Tue, 05 May 2026 21:09:20 +0000

Book: TypeScript in Production — Tooling, Build, and Library Authoring Across Runtimes
Also by me: The TypeScript Library — the 5-book collection
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You open a file called payments.test.ts. The first thing you
need to do is replace one function on a module the file under
test imports. In CJS this was a one-liner: reassign the property
on the required object and move on. In ESM that line throws.
Module namespace exports are read-only. You cannot patch them.
You cannot reassign them. The runtime will let you compile, then
slap your wrist at the moment the test tries to write.

So you reach for the runner. And here the answer depends on
which runner you reached for. Vitest has vi.mock. Bun has
mock.module. Node 24 has mock.module on the built-in
node:test runner, behind no flag now, and a different shape
again. Three surfaces, all doing the same conceptual thing,
none of them are quite drop-in compatible.

Below is the working set of patterns that survive across all
three runners, plus the moves to retire because in 2026 they
belong in a museum.

What broke

CJS gave you mutable exports. The module.exports object was
just an object. Tests reassigned its properties, code under test
re-read them on the next call, life was easy. Two patterns came
out of that:

const fs = require("fs");
fs.readFile = jest.fn();

jest.mock("../db", () => ({ query: jest.fn() }));

The first does not work in ESM at all. The namespace object
returned by import * as fs from "fs" is sealed. The second
looks like it works, but it only does because Jest used to
run ESM under a CJS transformer that dropped you back into a
mutable bag of exports. Native ESM does not give you that bag.

What you can do, what every modern runner agrees on, is replace
the module in the loader's cache before the consumer imports
it. The consumer's import statement reads the cache. If the
cache has a different module sitting there, the consumer gets
the different module. The mechanism for getting your replacement
into the cache is what differs between Vitest, Bun, and Node.

Two ground rules fall out of this:

The mock has to be installed before the import. Either the runner hoists your mock above the imports, or you defer the import to after you call the mock function. There is no third option.
You replace the whole module. You do not "stub one property." You return a new object that has whatever shape the consumer expects, and you keep the parts you do not want to fake by re-importing the real module and spreading it.

Hold those two rules and the rest is dialect.

Pattern 1: hoisted module mock

The most common shape. Replace the module up front, before any
test runs, and let every test in the file see the replacement.

Here is a small library. notifier.ts calls sendEmail from
./mailer:

// src/mailer.ts
export async function sendEmail(
  to: string, subject: string, body: string
): Promise<{ id: string }> {
  const res = await fetch("https://mail.test/v1/send", {
    method: "POST",
    body: JSON.stringify({ to, subject, body }),
  });
  return res.json();
}

// src/notifier.ts
import { sendEmail } from "./mailer";

export async function notifyOrderShipped(
  email: string, orderId: string
) {
  return sendEmail(
    email,
    "Your order has shipped",
    `Order ${orderId} is on the way.`
  );
}

You want to test notifyOrderShipped without hitting the
network. The fake replaces the whole ./mailer module.

Vitest. The mock call is hoisted above the imports by the
transformer. You write it as if it were imperative; Vitest moves
it for you:

import { describe, it, expect, vi } from "vitest";

vi.mock("./mailer", () => ({
  sendEmail: vi.fn(async () => ({ id: "msg_1" })),
}));

import { notifyOrderShipped } from "./notifier";
import { sendEmail } from "./mailer";

describe("notifyOrderShipped", () => {
  it("dispatches the email", async () => {
    const out = await notifyOrderShipped(
      "a@b.test", "ord_42"
    );
    expect(out.id).toBe("msg_1");
    expect(sendEmail).toHaveBeenCalledTimes(1);
  });
});

The two things that look wrong but are not: the vi.mock line
sits below the imports in the source, and the second import
of sendEmail brings in the mocked function, not the real
one. Vitest rewrites the file so the mock is registered before
either import statically resolves.

Bun. Same idea, different surface. mock.module is a real
runtime call, not a hoisted directive, so you put it before any
dynamic read of the module. For a fully static-import setup,
the safest place is a preload file:

// test-setup.ts
import { mock } from "bun:test";

mock.module("./src/mailer", () => ({
  sendEmail: async () => ({ id: "msg_1" }),
}));

Then run with bun test --preload ./test-setup.ts. The test
file itself stays plain:

import { describe, it, expect } from "bun:test";
import { notifyOrderShipped } from "./notifier";

describe("notifyOrderShipped", () => {
  it("dispatches the email", async () => {
    const out = await notifyOrderShipped(
      "a@b.test", "ord_42"
    );
    expect(out.id).toBe("msg_1");
  });
});

If you call mock.module inside the test file after a static
import of the consumer, the consumer has already read the real
module. You either preload or you switch the consumer-side
import to a dynamic one (pattern 3 below).

Node 24, node:test. The shape is the third dialect.
mock.module lives on the mock object exposed by node:test,
the export shape uses a namedExports field, and the consumer
of the mocked module must be loaded by dynamic import after
the mock is registered:

import { test, mock } from "node:test";
import assert from "node:assert/strict";

test("notifyOrderShipped dispatches the email", async () => {
  mock.module("./src/mailer.ts", {
    namedExports: {
      sendEmail: async () => ({ id: "msg_1" }),
    },
  });

  const { notifyOrderShipped } = await import(
    "./src/notifier.ts"
  );

  const out = await notifyOrderShipped(
    "a@b.test", "ord_42"
  );
  assert.equal(out.id, "msg_1");
});

Run it with node --test --experimental-test-module-mocks
(flag name as of Node 24 LTS). The module-mocks loader is still
flagged at the time of writing, even though node:test itself
is stable. Check
the test runner docs
on the version you ship before you wire it up. The field name
and the flag have changed; check the release notes for the
version you ship.

The mental model is the same across all three: replace the
module entry in the loader cache before the consumer reads it.

Pattern 2: per-test partial mock

The second-most-common shape. You want one export faked, the
other ten left alone. The trick is importActual (Vitest), a
spread of the real module (Bun and Node), and the same hoisting
discipline as before.

A module with two functions:

// src/clock.ts
export function now(): Date {
  return new Date();
}

export function format(d: Date): string {
  return d.toISOString();
}

You want now frozen at a known instant. You want format
real. The fake imports the real module and overrides one field.

Vitest:

import { describe, it, expect, vi } from "vitest";

vi.mock("./clock", async (importOriginal) => {
  const real = await importOriginal<
    typeof import("./clock")
  >();
  return {
    ...real,
    now: () => new Date("2026-04-29T00:00:00Z"),
  };
});

import { now, format } from "./clock";

describe("clock", () => {
  it("returns the frozen now()", () => {
    expect(now().toISOString()).toBe(
      "2026-04-29T00:00:00Z"
    );
  });

  it("still uses the real format()", () => {
    expect(format(new Date("2026-01-01T00:00:00Z")))
      .toBe("2026-01-01T00:00:00.000Z");
  });
});

importOriginal is the callback Vitest passes to the factory;
it returns the real module with proper types when you
parameterize it as shown. The factory runs once per file; the
spread keeps every export the test did not name explicitly.

Bun:

// test-setup.ts
import { mock } from "bun:test";

const real = await import("./src/clock");
mock.module("./src/clock", () => ({
  ...real,
  now: () => new Date("2026-04-29T00:00:00Z"),
}));

Bun's mock.module factory does not get an importOriginal
argument. You import the real module yourself, into the
preload, and spread it. Same outcome. Top-level await is fine
in a Bun preload; Bun's loader supports it natively.

Node node:test:

import { test, mock } from "node:test";
import assert from "node:assert/strict";

test("frozen now, real format", async () => {
  const real = await import("./src/clock.ts");
  mock.module("./src/clock.ts", {
    namedExports: {
      ...real,
      now: () => new Date("2026-04-29T00:00:00Z"),
    },
  });

  const { now, format } = await import(
    "./src/clock.ts"
  );
  assert.equal(
    now().toISOString(), "2026-04-29T00:00:00Z"
  );
  assert.equal(
    format(new Date("2026-01-01T00:00:00Z")),
    "2026-01-01T00:00:00.000Z"
  );
});

Pattern: import the real module, override the fields you want
fake, spread the rest. The runner specifics fade once you lock
onto the spread.

Pattern 3: dynamic-import-based mocking

You ship code that does code-splitting. The function under test
contains an await import("./heavy-thing"). The natural fit is
to install the mock, then trigger the dynamic import, then
assert.

// src/router.ts
export async function handle(name: string) {
  if (name === "billing") {
    const mod = await import("./handlers/billing");
    return mod.run();
  }
  return null;
}

The test, in Vitest:

import { describe, it, expect, vi } from "vitest";

vi.mock("./handlers/billing", () => ({
  run: vi.fn(async () => "billed"),
}));

import { handle } from "./router";

describe("handle", () => {
  it("dispatches to the billing handler", async () => {
    expect(await handle("billing")).toBe("billed");
  });
});

In Bun, the same shape works the same way as long as the mock
is installed (preload or before the dynamic import):

import { describe, it, expect, mock } from "bun:test";

mock.module("./src/handlers/billing", () => ({
  run: async () => "billed",
}));

import { handle } from "./src/router";

describe("handle", () => {
  it("dispatches to the billing handler", async () => {
    expect(await handle("billing")).toBe("billed");
  });
});

Static imports hoist; the mock.module call runs before
handle is invoked, and the await import(...) inside handle
reads the now-mocked entry. That is why the example works even
though mock.module appears below the import in source order.

In Node node:test, dynamic-import code is the easiest case
because the runtime is already ordering the call after the
mock:

import { test, mock } from "node:test";
import assert from "node:assert/strict";

test("router dispatches to billing", async () => {
  mock.module("./src/handlers/billing.ts", {
    namedExports: { run: async () => "billed" },
  });

  const { handle } = await import("./src/router.ts");
  assert.equal(await handle("billing"), "billed");
});

The dynamic-import shape sidesteps every hoisting question. The
mock is registered, then the consumer code reaches into the
loader, then it gets the fake. This is the cleanest pattern of
the three. If you can keep rarely-used branches behind dynamic
imports, your test suite is the same across runners.

Three patterns to retire

A short list of things that do not work in 2026 ESM, and that
old blog posts still tell you to do.

Stubbing require. proxyquire, mock-require, and the
jest.mock flow that depends on require.cache only work
because they are patching CJS resolution. ESM does not have
require.cache. If your test file is ESM and the consumer is
ESM, these libraries are a no-op or worse: they install a CJS
mock that the ESM loader never reads.

Monkey-patching mutable exports.

import * as mailer from "./mailer";
(mailer as any).sendEmail = vi.fn();

Throws in native ESM (Node, Bun, Deno). The namespace object is
sealed. You will see TypeError: Cannot assign to read only property 'sendEmail' of object '[object Module]'. Some Vitest
configs used to let this slide via a CJS-shim transformer; that
path is gone in Vitest 2+. If your existing tests rely on this,
the migration is to one of the three patterns above.

Mutating the imported binding. Same problem, different
syntax:

import { sendEmail } from "./mailer";
sendEmail = vi.fn(); // syntax error

ESM imports are bindings, not assignments. The compiler refuses.

A rule that survives all three

Most of the runner-specific pain disappears the moment your
function takes its collaborators as parameters. A notifyOrderShipped
that accepts the sendEmail function as an argument needs no
module mock at all. You pass a fake. The places that genuinely
need module mocking are the ones where you cannot reach the
seam: framework-loaded modules, deeply nested helpers, dynamic
imports you do not own.

For those, hold the three patterns above and pick whichever
runner the rest of your stack uses. Vitest if your tooling is
Vite-aligned. Bun if your runtime is Bun. node:test if your
constraint is "no test-runner dependency in production." Pick
the runner your stack already uses; the patterns above carry
over.

If this was useful

Module systems, ESM-CJS interop, the loader cache, and the
contortions test runners go through to mock around them are
exactly the ground TypeScript in Production covers in long
form. The build chapter is the one that maps a single tsc
invocation to "what does this package look like under Node, Bun,
Deno, and a bundler"; the testing chapter is the long version
of this post, including coverage, fixtures, and the matrix that
catches the runner-divergence bugs before a downstream user
files an issue.

If you want the ground floor before the production layer,
TypeScript Essentials is the entry point. If you want the
type system depth that makes typed mocks (vi.mocked<T>,
MockInstance<F>) compose without as any, The TypeScript
Type System is the deep dive.

The five-book set:

TypeScript Essentials — From Working Developer to Confident TS, Across Node, Bun, Deno, and the Browser — entry point: amazon.com/dp/B0GZB7QRW3
The TypeScript Type System — From Generics to DSL-Level Types — deep dive: amazon.com/dp/B0GZB86QYW
Kotlin and Java to TypeScript — A Bridge for JVM Developers — bridge for JVM devs: amazon.com/dp/B0GZB2333H
PHP to TypeScript — A Bridge for Modern PHP 8+ Developers — bridge for PHP devs: amazon.com/dp/B0GZBD5HMF
TypeScript in Production — Tooling, Build, and Library Authoring Across Runtimes — production layer: amazon.com/dp/B0GZB7F471

All five books ship in ebook, paperback, and hardcover.

LLM Response Caching: When the 80/20 Hit Rate Saves the Bill

Gabriel Anhaia — Tue, 05 May 2026 21:08:33 +0000

Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You run a content moderation classifier. The same fifty memes show up in your queue every hour because they are trending across every account. The same support intent ("password reset") fires from a thousand chats a day. Your CI eval suite reruns the same prompt set every time someone touches a config file. Each of those calls hits the model, pays for the tokens, and returns the same answer it returned an hour ago.

This is not what Anthropic's prompt caching is for. Prompt caching trims the cost of the input on a cache hit. You still pay for the output, and the model still runs. Useful when the prefix is long and shared across many calls with different tails. Different problem.

Response caching is the cheaper sibling. You hash the entire request: model, full prompt, tool set, sampling params. If you have seen it before, you return the stored response from Redis without calling the model at all. Zero tokens. Zero latency past the Redis round trip. The catch is that the hit only counts when the request is genuinely identical, which rules out a lot of LLM workloads. The ones it does fit well are the ones this post is about.

When the 80/20 actually shows up

The pattern works when your input distribution is heavy-tailed and your output for a given input is supposed to be deterministic. A short list:

Classification. Intent detection, content moderation, sentiment scoring, language ID. Same input string, same answer. If the same email subject lands in your queue ten thousand times, you should pay for it once.
Idempotent agent tools. "Summarise this PDF," "extract the JSON from this scrape." Pure functions. Caching them is the same shape as memoising any other pure function.
Eval reruns. CI runs your eval set on every PR. The prompts have not changed; only the model version or system prompt has. Key the cache by both and you re-run only what differs.
RAG fallbacks. The retrieved context is identical for the same question for the next five minutes. Cache the synthesis step, not the retrieval.

Where it does not earn its keep:

Open-ended generation. "Write me a poem about the sea." Every user wants a different poem. Cache hit rate is rounding error, and you would not want a hit if you got one.
Anything stateful. Multi-turn chat where the conversation is the input. The hash space is unbounded, and the next message will not match.
Anything personalised. The user id is in the prompt; the cache key is per user; you bought yourself a giant Redis bill and a hit rate close to zero.

If you cannot draw the input distribution and point at a fat head, response caching will not save you money. If you can, it usually does.

The keying strategy is the whole design

The hash is the part you have to get right. Every field you include is a promise that two requests sharing those values will produce the same response; every field you leave out is a bet that it does not affect the output. Get it wrong and you serve a stale answer for a request that is not actually the same.

The fields that must be in the key:

The model id, including the version suffix. model-vX and model-vY are not the same function, even when only the patch number moves.
The full message list, serialised in a stable order. JSON with sorted keys; do not rely on dict insertion order across Python versions.
The system prompt.
The tool definitions, if you pass any. A tool change is a behaviour change.
The sampling parameters: temperature, top_p, max_tokens, seed if your provider supports it.

The fields that must not be in the key, or the hit rate goes to zero:

The request id. The trace id. Anything per-call.
The wall clock. Today's date in the system prompt, if you are tempted, will tank your hit rate.
The user id, unless the response is genuinely user-specific (in which case you might be solving the wrong problem with this tool).

The other rule: only cache when the call is supposed to be deterministic. That means temperature=0, and ideally seed set to a fixed value. Caching at temperature=0.7 will store one of many valid answers and serve it forever. If the user sees the same "creative" answer twice, it stops feeling creative.

The 70-line implementation

Redis as the store. SHA-256 as the hash. SETNX to guard against cache stampedes. That is the case where ten parallel workers all miss the same key at the same time and all call the model. You want one of them to win the call and the rest to wait for the result.

import hashlib
import json
import time
from dataclasses import dataclass
from typing import Any, Callable

import redis

r = redis.Redis(decode_responses=True)

CACHE_TTL = 24 * 60 * 60
LOCK_TTL = 30
WAIT_POLL = 0.1
WAIT_MAX = 20.0

Two TTLs. The result lives a day; the lock lives only as long as a worst-case model call. Tune both to your workload.

The key builder is the load-bearing function. Stable JSON, sorted keys, every field that affects the response.

def cache_key(
    model: str,
    messages: list[dict],
    system: str | None,
    tools: list[dict] | None,
    temperature: float,
    max_tokens: int,
    seed: int | None,
) -> str:
    if temperature != 0:
        return ""
    payload = {
        "model": model,
        "messages": messages,
        "system": system or "",
        "tools": tools or [],
        "temperature": temperature,
        "max_tokens": max_tokens,
        "seed": seed,
    }
    raw = json.dumps(
        payload, sort_keys=True, separators=(",", ":")
    )
    digest = hashlib.sha256(raw.encode()).hexdigest()
    return f"llmcache:v1:{digest}"

temperature != 0 returns an empty string and the caller treats that as "do not cache." Better to skip than to lie.

The lookup-or-call function with the SETNX gate:

@dataclass
class CacheStats:
    hits: int = 0
    misses: int = 0
    waits: int = 0

def cached_call(
    key: str,
    call_model: Callable[[], dict[str, Any]],
    stats: CacheStats,
) -> dict[str, Any]:
    if not key:
        return call_model()
    cached = r.get(key)
    if cached is not None:
        stats.hits += 1
        return json.loads(cached)
    lock_key = f"{key}:lock"
    if r.set(lock_key, "1", nx=True, ex=LOCK_TTL):
        try:
            result = call_model()
            r.setex(
                key, CACHE_TTL, json.dumps(result)
            )
            stats.misses += 1
            return result
        finally:
            r.delete(lock_key)
    deadline = time.time() + WAIT_MAX
    while time.time() < deadline:
        time.sleep(WAIT_POLL)
        cached = r.get(key)
        if cached is not None:
            stats.waits += 1
            return json.loads(cached)
    stats.misses += 1
    return call_model()

The function has three paths. A hit comes straight back from Redis. The winner of the lock calls the model and writes the result. Everyone else waits for that result with a deadline so a dead worker does not block forever. If the wait times out, the loser falls through and calls the model itself. That is cheaper than stalling every caller behind a single dead lock.

The wrapper for the actual SDK call:

def model_call_factory(client, **req):
    def call():
        resp = client.messages.create(**req)
        return {
            "text": resp.content[0].text,
            "usage": {
                "input": resp.usage.input_tokens,
                "output": resp.usage.output_tokens,
            },
            "model": resp.model,
            "stop_reason": resp.stop_reason,
        }
    return call

You serialise only what you need. Storing the entire SDK response object is tempting and a bad idea. The schema changes between SDK versions, and you do not want a deserialisation crash on a cache hit six months from now. Pick the fields your code actually reads.

Putting it together:

def ask(client, **req) -> dict[str, Any]:
    key = cache_key(
        model=req["model"],
        messages=req["messages"],
        system=req.get("system"),
        tools=req.get("tools"),
        temperature=req.get("temperature", 1.0),
        max_tokens=req["max_tokens"],
        seed=req.get("seed"),
    )
    return cached_call(
        key, model_call_factory(client, **req), STATS
    )

Seventy lines. One Redis dependency. A hit rate you can read off STATS and graph next to your model spend.

The cost math, with hedges

Pricing changes; the Anthropic pricing page is the only source of truth, and rates move enough that any number a blog post hard-codes will be wrong by the time you read it. Re-check before you forecast.

What does not change is the algebra. If your raw-call cost is C per request and your cache hit rate is h, your effective cost per logical request is C * (1 - h) + redis_cost. Redis is cheap enough on the call-path that you can treat it as a rounding error against a model call. So a 60% hit rate gives you 40% of the bill. An 80% hit rate gives you 20% of the bill. The fat head of a classifier's input distribution is what makes the second number realistic for some workloads; chat workloads are open-ended enough that the head never gets fat, so 80% stays unreachable.

The honest accounting still has to subtract the cost of being wrong. Every cache hit is one missed opportunity for the model to give a fresh answer. If your model gets better between the cache write and the cache read, your users see the old answer. The TTL is your knob: a one-day TTL on a moderation cache is fine, but on a "what is the latest news" cache it ships yesterday's headlines and quietly tanks user trust.

What to instrument

Three numbers, on a dashboard, before you call it shipped:

Hit rate. hits / (hits + misses) over a rolling window. If it is below 30%, the cache is paying for itself but barely; below 10% you are running Redis for sport.
Stampede rate. waits / misses. A healthy number is small: a few percent on a hot key. If it climbs, your LOCK_TTL is too long, or your model call is timing out under the lock and leaving a stale lock in place.
Eviction or expiry rate. How often you write a key that gets evicted before any read. High eviction means your TTL is too long for your Redis size; cache is doing more work than it is paying back.

The LLM Observability Pocket Guide has a chapter on this exact dashboard — what to attach to your traces so a hit-rate regression shows up before the bill does.

What to try on Monday

Pick one workload from the list at the top of the post — the moderation classifier, the intent router, the eval rerun job — and wrap its model call in cached_call. Ship the three numbers from the previous section to whatever dashboard you already use. If the hit rate climbs above 30% in a day, write an RFC for the rest of the team. If it does not, the request was never deterministic in the first place, and your next move is Anthropic's prompt caching or a smaller model rather than this pattern.

If this was useful

The LLM Observability Pocket Guide covers the rest of the cost-and-cache stack: how to key requests so the hit rate is real, how to wire stampede protection without leaving stale locks, and which signals to attach to your traces so a 60% hit rate does not quietly drift to 6% the week your prompt template changes.

Stop Wrapping Errors in TypeScript. Use `cause` Instead.

Gabriel Anhaia — Tue, 05 May 2026 21:08:05 +0000

Book: TypeScript Essentials — From Working Developer to Confident TS, Across Node, Bun, Deno, and the Browser
Also by me: The TypeScript Library — the 5-book collection
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You open a postmortem. The error in the log says
Error: Failed to load user profile: invalid input syntax for type uuid. You read it twice. You know which function threw it because
its message has the literal string Failed to load user profile,
but the part after the colon is from somewhere four layers below,
and the stack trace goes back as far as the function that did the
wrapping. The original pg error has been flattened into a string.
The file, the line, and the SQL state code that would tell you who
handed in a bad UUID are gone. Whoever wrote that wrapper threw
away the only piece of the puzzle you actually needed.

This is a self-inflicted wound. ES2022 shipped
new Error(message, { cause }) four years ago. V8 had it in 9.3,
which means Node has had it since 16.9.0. Every browser shipped
it within the same year. And yet most TypeScript codebases still
have a lib/errors.ts with a hand-rolled wrap(err, message)
helper, or a class AppError extends Error that stuffs the
original into a field named originalError. Worst of all,
throw new Error(\Failed: ${err.message}) keeps no reference
to the original at all.

Drop the helper. Use cause.

The three patterns to delete

The first one is the silent stack killer:

async function loadUser(id: string): Promise<User> {
  try {
    return await db.users.findById(id);
  } catch (err) {
    throw new Error(
      `Failed to load user ${id}: ${(err as Error).message}`,
    );
  }
}

The thrown error has a fresh stack pointing at the throw, the
message of the original concatenated into a string, and zero way
to recover the original instance. If err was a pg.DatabaseError
with a code field of 22P02 (invalid text representation), you
just lost the code, the position, the file, and the type. The
caller cannot do if (err instanceof DatabaseError) because the
DatabaseError is gone.

The second pattern stores the original on a custom field:

class AppError extends Error {
  constructor(
    message: string,
    public readonly originalError?: Error,
  ) {
    super(message);
    this.name = "AppError";
  }
}

throw new AppError("Failed to load user", err as Error);

This one preserves the original. The cost is that nothing else
knows about originalError. Node's util.inspect does not walk
it. console.log does not unfold it. Observability SDKs have no
convention for a field with that name. Your team walks it manually
in serialiseError(...) and the on-call engineer finds out the
hard way that the new service forgot the helper.

The third pattern is the well-meaning custom helper:

export function wrap(err: unknown, message: string): Error {
  const wrapped = new Error(message);
  (wrapped as Error & { cause?: unknown }).cause = err;
  return wrapped;
}

You are now reimplementing cause by hand, but late, and on the
wrong type. You stripped the type information at the cast. You
attached the cause as enumerable, which cause is not in the
spec, so anything that walks Object.keys(err) will treat it
differently from a real cause. Most importantly: you did not need
to write this function.

What `cause` actually does

The constructor signature is part of the language:

const cause = await something().catch((e) => e);
throw new Error("Failed to load user profile", { cause });

The Error constructor takes an options bag with a single
recognised key, cause. The runtime sets a non-enumerable
cause property on the new Error whose value is whatever you
passed. Non-enumerable means JSON.stringify will not see it,
Object.keys will not list it, but property access (err.cause)
returns it. Tools that know to look will walk the chain
recursively: util.inspect, V8's own error formatting, and
modern observability SDKs.

In TypeScript, the type for cause is unknown. You are
catching something. It could be an Error, a string someone
threw, or a DOMException, and the language refuses to lie
about what came out of the catch block. The shape of error
handling at the receiving end is the same shape
useUnknownInCatchVariables already pushed you toward in
TypeScript 4.4: narrow before you use.

try {
  await loadUser(id);
} catch (err: unknown) {
  if (err instanceof Error && err.cause instanceof DatabaseError) {
    if (err.cause.code === "22P02") {
      return reply.status(400).send({ error: "Bad UUID" });
    }
  }
  throw err;
}

The instanceof check on the outer error is what gets you the
Error shape. The second instanceof walks one level into the
chain. The pattern composes. For a deeper wrap, you check
err.cause on the cause.

What the runtime preserves

This is the part that earns the swap. Each Error in the chain
keeps its own stack trace because each one was constructed at
its own throw site. Node's util.inspect (which is what
console.log calls under the hood for objects) walks the chain
and prints every level:

Error: Failed to load user profile
    at loadUser (/app/users.ts:14:11)
    at processOrder (/app/orders.ts:42:5)
  [cause]: error: invalid input syntax for type uuid: "not-a-uuid"
      at Parser.parseErrorMessage (/app/node_modules/pg-protocol/...)
      at Parser.handlePacket (/app/node_modules/pg-protocol/...)
      at Parser.parse (/app/node_modules/pg-protocol/...)

Two stacks, both intact. The outer one tells you which call site
in the application asked for the user. The inner one tells you
which row in the protocol parser actually choked. The string
concatenation version gave you neither. The outer message
included the inner message, but the stack was a single layer.

Note that the pg driver formats its own Error subclass with
a lowercase error: prefix, which is why the inner line above
reads error: while the outer reads Error:. The cause chain
preserves whatever each layer wrote.

V8 builds the stack string at the moment the Error is
constructed. SpiderMonkey and JavaScriptCore do the same. The
chain is not a special case the engine has to support. It is
two error objects, each holding its own stack, with one
referencing the other through a non-enumerable property.

The catch-shape gotcha

Anything can be thrown. JavaScript does not require thrown
values to be Error instances, and the type of the catch
binding is unknown for that reason. If a library throws a
plain object, or a string, or null, the instanceof Error
check on the cause fails and your narrowing logic skips that
branch. You have two honest options.

Option one: trust nothing, and write a small normaliser at the
boundary where third-party code can throw:

function asError(value: unknown): Error {
  if (value instanceof Error) return value;
  const msg =
    typeof value === "string" ? value : JSON.stringify(value);
  return new Error(msg);
}

try {
  await thirdPartyThing();
} catch (raw) {
  throw new Error("third-party thing failed", {
    cause: asError(raw),
  });
}

The cause is now always an Error, and the chain stays uniform
all the way up. Option two: keep cause as unknown and have
your top-level handler do the instanceof check before it tries
to read .message. Either is fine; what is not fine is casting
err as Error everywhere and pretending the type system is
wrong.

Where the chain still leaks

Two failure modes are worth naming.

The first is serialisation over the wire. JSON.stringify(err)
returns {} for an Error because none of name, message,
stack, or cause are enumerable. If you ship errors across an
HTTP boundary, an IPC channel, or a worker postMessage, you
have to flatten them by hand:

function serialiseError(err: unknown): unknown {
  if (!(err instanceof Error)) return err;
  return {
    name: err.name,
    message: err.message,
    stack: err.stack,
    cause:
      err.cause === undefined
        ? undefined
        : serialiseError(err.cause),
  };
}

The receiving side does the inverse: reconstruct an Error
with the deserialised cause as another Error. Without this
step, your worker crashes look like Error: {} on the parent
and the chain is gone.

The second is the observability layer. Most observability
vendors support cause chains in their JS SDKs, but the rendering
of the inner stack is uneven. As of 2026 there is an open issue
against the Sentry JavaScript SDK about cause-stack rendering.
The inner cause is captured, but the stack trace shown in the UI
sometimes points at the outer error instead of the inner one
(sentry-javascript#14983).
Check what your tool actually shows in production. The data is
captured; the rendering is the part that varies.

The migration is mechanical

If you have a wrap(err, message) helper, the codemod is one
line:

// before
throw wrap(err, "Failed to load user profile");

// after
throw new Error("Failed to load user profile", { cause: err });

If you have an AppError extends Error with originalError,
move the field to the constructor's options bag:

class AppError extends Error {
  constructor(message: string, options?: { cause?: unknown }) {
    super(message, options);
    this.name = "AppError";
  }
}

throw new AppError("Failed to load user", { cause: err });

Now err.cause works on every AppError, every plain Error,
and every error you build going forward. The custom getter for
originalError goes away. The serialiser uses one path. The
on-call engineer reads two stacks instead of one.

cause is the small ES2022 feature your error layer has been
waiting for. Stop writing the wrapper. The runtime has it.

If this was useful

TypeScript error handling has a shape: unknown catches,
narrowing with instanceof, the discriminated-union pattern for
domain errors, and where to draw the line between a thrown
Error and a returned Result type. That is one of the
day-one chapters in TypeScript Essentials. If your team's
error story is still patchwork, that book is the one that puts
it on a shared foundation.

The deeper type-system tricks land in The TypeScript Type
System: branded error subclasses, template-literal codes, and
conditional types that infer the error union from a function
signature. If you are coming from PHP 8+'s Throwable hierarchy,
PHP to TypeScript maps the patterns side by side. From Java's
checked exceptions, Kotlin and Java to TypeScript covers the
same ground. TypeScript in Production is the one that covers
serialising errors over the wire, the observability-tool
integration, and the build-target choices that affect what
cause compiles to.

The five-book set:

TypeScript Essentials — From Working Developer to Confident TS, Across Node, Bun, Deno, and the Browser — entry point: amazon.com/dp/B0GZB7QRW3
The TypeScript Type System — From Generics to DSL-Level Types — deep dive: amazon.com/dp/B0GZB86QYW
Kotlin and Java to TypeScript — A Bridge for JVM Developers — bridge for JVM devs: amazon.com/dp/B0GZB2333H
PHP to TypeScript — A Bridge for Modern PHP 8+ Developers — bridge for PHP devs: amazon.com/dp/B0GZBD5HMF
TypeScript in Production — Tooling, Build, and Library Authoring Across Runtimes — production layer: amazon.com/dp/B0GZB7F471

All five books ship in ebook, paperback, and hardcover.

The 7 Anthropic API Errors That Mean Different Things in Production

Gabriel Anhaia — Tue, 05 May 2026 21:07:38 +0000

Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I talked to last month had one exception handler around every Anthropic call. The handler caught Exception, logged the message, retried three times with a fixed two-second sleep, then surfaced a generic 500 to the user. That code shipped to production for nine months. Nobody noticed it was wrong until a coworker rotated an API key and the entire support-ticket classifier started retrying the same authentication_error three times per request before failing. In that scenario, every retry on a permanent error stretches the latency tail of the rollout window — exactly the kind of self-inflicted brownout a richer taxonomy avoids.

That is the cost of treating the Anthropic API as a binary. The response taxonomy carries real information. Different error types want different reactions: some you retry, others you don't; a few want aggressive backoff, a few want immediate failure. The rest belong in your input-validation layer and should never have reached the network.

The seven error shapes below each map to a different reaction. The taxonomy and status codes come from the Anthropic errors docs. The exception classes are from the official Python SDK.

The shape of an error response

Every error from the API arrives as JSON with a top-level error object that has a type and a message, plus a request_id you should log on every failure:

{
  "type": "error",
  "error": {
    "type": "not_found_error",
    "message": "The requested resource could not be found."
  },
  "request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}

The Python SDK wraps these into typed exceptions, with the HTTP status driving which class gets raised. Mapping is direct:

Status	SDK class	API `error.type`
400	`BadRequestError`	`invalid_request_error`
401	`AuthenticationError`	`authentication_error`
403	`PermissionDeniedError`	`permission_error`
404	`NotFoundError`	`not_found_error`
413	`APIStatusError`	`request_too_large`
429	`RateLimitError`	`rate_limit_error`
500	`InternalServerError`	`api_error`
529	`InternalServerError` / `APIStatusError`	`overloaded_error`

The 529 is the one most teams miss in their handler. The SDK may surface it as InternalServerError (the base for 5xx) or as APIStatusError; either way it wants its own retry policy, distinct from a 500.

1. `rate_limit_error` (429): retry with jitter, not a fixed sleep

This means your account hit a rate limit on requests-per-minute, tokens-per-minute, or both. It is your problem, not Anthropic's. The fix is backoff.

The SDK already retries 429s a couple of times by default with a short exponential backoff. If your traffic shape is bursty enough that the default isn't enough, configure it explicitly and add jitter so a thundering herd of failed clients does not synchronise on the same retry slot:

import random, time
from anthropic import Anthropic, RateLimitError

client = Anthropic(max_retries=4)

def call_with_jitter(prompt: str, attempts: int = 5):
    for i in range(attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}],
            )
        except RateLimitError:
            if i == attempts - 1:
                raise
            sleep = (2 ** i) + random.random()
            time.sleep(sleep)

Observability metric to track: anthropic_rate_limit_hits_total{tier=...}. Alert on rate-of-change, not absolute count. A few 429s a minute is fine. A spike from 0 to 200 in 30 seconds means your traffic profile changed and your provisioned capacity has not.

2. `overloaded_error` (529): back off longer, fail over if you have a fallback

Status 529 means Anthropic is temporarily overloaded across all users. This is capacity, not your account. Anthropic's docs note that 529s happen when their APIs experience high traffic across all users. Retrying immediately will hit the same wall. Retrying with the same backoff curve as a 429 is wrong: you're not the one who needs to slow down, you're waiting for someone else to finish.

from anthropic import APIStatusError

def is_overloaded(e: APIStatusError) -> bool:
    return getattr(e, "status_code", None) == 529

def call_with_failover(prompt: str):
    try:
        return primary_call(prompt)
    except APIStatusError as e:
        if is_overloaded(e):
            return fallback_call(prompt)
        raise

Two reactions are reasonable. Either back off with a much longer base (5, 15, 45 seconds) or fail over: a different model, a different region, a cached response, or a degraded path that skips the model entirely. Whichever you pick, don't silently retry on the same client at the same rate. You'll turn a brownout into your own outage.

3. `invalid_request_error` (400): never retry, fix the input

A BadRequestError is your fault. The shape of the request is wrong. Common causes: max_tokens set absurdly low for the kind of response you asked for, message content too long for the context window, role-alternation violations in the messages array, content blocks of an unsupported type, or unsupported parameter combinations. Some models reject prefilled assistant messages with a 400 in specific configurations — verify against your model's docs before relying on prefill.

from anthropic import BadRequestError

try:
    msg = client.messages.create(...)
except BadRequestError as e:
    log.error(
        "anthropic.bad_request",
        extra={
            "request_id": getattr(e, "request_id", None),
            "error_type": e.body.get("error", {}).get("type"),
            "message": str(e),
        },
    )
    raise InputValidationFailed(str(e))

Retry budget here is zero. The same input will fail the same way every time. The actionable signal is the error.message. Read it, surface it to whoever owns the prompt template, and add a unit test that catches it next time. Catching BadRequestError to retry it means there is a bug upstream in your prompt-construction layer, and the retry is hiding it.

4. `authentication_error` (401): never retry, page someone

A 401 means the key is missing, malformed, revoked, or rotated and your service has stale config. None of those get better with another HTTP request. The correct reaction is to stop calling the API, surface the error to your config / secrets layer, and page the on-call.

from anthropic import AuthenticationError

try:
    msg = client.messages.create(...)
except AuthenticationError:
    metrics.incr("anthropic_auth_failure_total")
    raise AuthConfigError("Anthropic auth failed; rotate key")

Note: the snippet above raises a domain exception rather than calling SystemExit. In a long-lived API server, SystemExit tears down the worker on a single auth blip, which is rarely what you want. In a CLI or job runner, a hard exit is fine. Pick the shape that matches your runtime.

During a key rotation, a generic retry handler turns a 30-second blip into a 90-second outage with 3× the failed-request volume. If your handler catches AuthenticationError and retries, kill that branch.

5. `permission_error` (403): the key works, your model access does not

A 403 means the API key is valid but it does not have permission for what you asked. The most common shape: asking for a model your organisation has not been granted access to — a beta model, a region-locked SKU, or a Bedrock-only deployment hit through the public endpoint. Anthropic also returns request_too_large (413) when payloads exceed the per-request size limit; that one is an input-shape problem you fix upstream, not an auth issue.

from anthropic import PermissionDeniedError

model_name = "claude-sonnet-4-5"

try:
    msg = client.messages.create(model=model_name, ...)
except PermissionDeniedError as e:
    log.error("anthropic.permission_denied", extra={
        "model": model_name,
        "request_id": getattr(e, "request_id", None),
    })
    raise ConfigError(
        "Model not enabled for this org; check console"
    )

Retry budget is zero, same as auth. The fix is in the Anthropic console, not in your retry loop. Track this on a separate counter — key rotations and model-access misconfigs need different runbooks.

6. `not_found_error` (404): your model name is wrong

This one is sneaky because it sounds like a transient resource lookup, and your handler probably treats it as one. From the Messages API, a 404 almost always means you spelled the model name wrong, asked for a model that has been retired, or used a family alias when the API expected a dated snapshot from the model list.

from anthropic import NotFoundError

try:
    msg = client.messages.create(model=model_name, ...)
except NotFoundError:
    log.error("anthropic.unknown_model", extra={
        "model": model_name,
    })
    raise ConfigError(f"Unknown Anthropic model: {model_name}")

The reason this matters separately from BadRequestError is the diagnostic. A 400 says the request is shaped wrong; a 404 says the resource you named does not exist. Telling them apart in your logs saves a round-trip when a model deprecation lands and a few of your fleet are still pinned to an old name.

7. `api_error` (500): retry once, then escalate

A 500 from Anthropic's side is rare and non-deterministic. The right reaction is one retry, then escalate as a real failure. The SDK retries on 500 by default; in many production loops, one retry is enough to ride through a transient hiccup without amplifying a real incident.

from anthropic import InternalServerError

def call_once_retry(prompt: str):
    for attempt in range(2):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}],
            )
        except InternalServerError as e:
            if attempt == 0:
                metrics.incr("anthropic_api_error_retry_total")
                time.sleep(1)
                continue
            metrics.incr("anthropic_api_error_unrecoverable_total")
            raise

If 500s climb above noise, that is an Anthropic incident, and your dashboard should send you to status.anthropic.com instead of into a tighter retry loop. The retry-once-then-fail policy keeps your service from amplifying a backend brownout while still riding through one-off blips.

Putting it together

The handler at the top of every Anthropic call ends up looking like this:

import anthropic

def safe_call(prompt: str):
    try:
        return client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        )
    except anthropic.AuthenticationError:
        raise AuthConfigError("rotate key")
    except anthropic.PermissionDeniedError:
        raise ConfigError("model not enabled")
    except anthropic.NotFoundError:
        raise ConfigError("unknown model name")
    except anthropic.BadRequestError as e:
        raise InputValidationFailed(str(e))
    except anthropic.RateLimitError:
        return call_with_jitter(prompt)
    except anthropic.InternalServerError as e:
        if getattr(e, "status_code", None) == 529:
            return call_with_failover(prompt)
        return call_once_retry(prompt)
    except anthropic.APIStatusError as e:
        if getattr(e, "status_code", None) == 529:
            return call_with_failover(prompt)
        raise

A note on the helpers: call_with_jitter, call_with_failover, and call_once_retry each issue a fresh request to the API, so any error they raise is their problem to handle internally. If you want every branch to log a request_id exactly once, wrap the helpers in their own try/except using the same patterns above, or have them return structured results instead of raising. Don't assume a retry inside a helper inherits the outer handler's coverage.

Seven branches, seven reactions, one log line per branch with the request_id attached. The next time something fails, your dashboard tells you whether it is your config, your traffic, your input shape, or Anthropic's day, without anyone reading raw exception messages by hand.

If this was useful

Production agents fail in more shapes than one. Error handling is the boring half of the work that keeps the interesting half running. The AI Agents Pocket Guide covers the patterns for building agents that survive contact with real traffic: bounded loops, retry policies, tool-call hygiene, and the failure modes that bite teams in week three.

Spring Boot to NestJS: A Mental Model for Java Developers

Gabriel Anhaia — Tue, 05 May 2026 21:07:11 +0000

Book: Kotlin and Java to TypeScript — A Bridge for JVM Developers
Also by me: The TypeScript Library — the 5-book collection
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A Spring engineer joins a Node team. They open the new repo and the file tree looks suspicious: users.controller.ts, users.service.ts, users.module.ts. They click into the controller:

@Controller('users')
export class UsersController {
  constructor(private readonly users: UsersService) {}

  @Get(':id')
  show(@Param('id') id: string) {
    return this.users.findById(id);
  }
}

That is Spring vocabulary. @Controller, constructor injection, a service collaborator. The Spring engineer reads twenty more lines and the muscle memory holds: @Injectable, @Module, @Get, @Post, @Body. NestJS picked the names on purpose.

Then the analogy breaks. The module file lists every provider by hand. There is no classpath scan. The decorators run at startup, not at compile time. And by the time a fix lands in production, an import from a CommonJS package has done something strange to the dependency graph that no Spring container would do.

NestJS is the one Node framework where a Java engineer's instincts are mostly right. A mental model makes the mostly-right parts cheap and the where-it-breaks parts visible. Current major is NestJS 11 (11.1.x as of April 2026). A v12 release with broader ESM support is on the project's public roadmap.

Spring to NestJS, term by term

Read the right column as the nearest equivalent. It is not a rename. The differences in the next section matter.

Spring	NestJS	What it is
`@Component` / `@Service` / `@Repository`	`@Injectable()`	A class the container can construct and inject
`@Controller`	`@Controller(path)`	HTTP entry point
`@Autowired`	constructor parameter	Dependency injection
`@Qualifier` / `@Primary`	`@Inject(TOKEN)`	Pick one of N implementations
`ApplicationContext`	`Module` system	Wiring graph
`@Configuration` + `@Bean`	factory provider in a `Module`	Programmatic provider
`@ConfigurationProperties`	`ConfigModule` + `ConfigService`	Typed config
Spring AOP / aspects	`Interceptor` + `Guard` + `Pipe` + `ExceptionFilter`	Cross-cutting concerns
Spring Data JPA	TypeORM / Drizzle / Prisma	Persistence
`@Valid` + JSR-380	`ValidationPipe` + `class-validator` (or Zod via a pipe)	Input validation
`@Scheduled`	`@nestjs/schedule` `@Cron`	Cron jobs
Spring Security	`Guard` + Passport strategies via @nestjs/passport	Auth
`WebApplicationContext` per request	`REQUEST`-scoped providers	Per-request beans
`application.yml` profiles	`NODE_ENV` + per-env `.env`	Environments
Micrometer	`@nestjs/terminus` + OpenTelemetry SDK	Metrics & health

If a Spring concept is not on this list, it probably has a translation, just one that requires more code on the NestJS side. Four interesting ones land in the next section: modules, AOP, validation, persistence.

A small Spring controller, line by line in NestJS

Take a tiny endpoint: GET /users/{id} returns the user, POST /users creates one. Spring first.

@RestController
@RequestMapping("/users")
public class UsersController {

    private final UsersService users;

    public UsersController(UsersService users) {
        this.users = users;
    }

    @GetMapping("/{id}")
    public UserDto show(@PathVariable String id) {
        return users.findById(id);
    }

    @PostMapping
    public ResponseEntity<UserDto> create(
            @Valid @RequestBody CreateUserRequest body) {
        UserDto created = users.create(body);
        return ResponseEntity
            .status(HttpStatus.CREATED)
            .body(created);
    }
}

@Service
public class UsersService {

    private final UsersRepository repo;

    public UsersService(UsersRepository repo) {
        this.repo = repo;
    }

    public UserDto findById(String id) {
        return repo.findById(id)
            .map(UserDto::from)
            .orElseThrow(() -> new NotFoundException(id));
    }

    public UserDto create(CreateUserRequest req) {
        UserEntity saved = repo.save(UserEntity.from(req));
        return UserDto.from(saved);
    }
}

The NestJS port:

import {
  Body, Controller, Get, HttpCode, HttpStatus,
  NotFoundException, Param, Post,
} from '@nestjs/common';
import { UsersService } from './users.service';
import { CreateUserDto } from './dto/create-user.dto';

@Controller('users')
export class UsersController {
  constructor(private readonly users: UsersService) {}

  @Get(':id')
  show(@Param('id') id: string) {
    return this.users.findById(id);
  }

  @Post()
  @HttpCode(HttpStatus.CREATED)
  create(@Body() body: CreateUserDto) {
    return this.users.create(body);
  }
}

import { Injectable, NotFoundException } from '@nestjs/common';
import { UsersRepository } from './users.repository';
import { CreateUserDto } from './dto/create-user.dto';

@Injectable()
export class UsersService {
  constructor(private readonly repo: UsersRepository) {}

  async findById(id: string) {
    const user = await this.repo.findById(id);
    if (!user) throw new NotFoundException(`user ${id}`);
    return user;
  }

  create(dto: CreateUserDto) {
    return this.repo.save(dto);
  }
}

Walk it line by line. @RestController collapses into @Controller (every NestJS controller is REST by default; render templates are an opt-in). @RequestMapping("/users") is the argument to @Controller. @GetMapping("/{id}") becomes @Get(':id') and the path-variable syntax shifts from braces to colons because that is the path-to-regexp shape Express uses.

@PathVariable and @RequestBody become @Param and @Body. ResponseEntity.status(...) collapses into the @HttpCode decorator plus a plain return. NestJS serializes whatever the handler returns and infers the success status from the HTTP method (201 for POST, 200 for the rest).

The service is more boring. @Service becomes @Injectable(). The constructor is the same idea with a different language: TypeScript's private readonly shorthand assigns and types the field in one move, the way Lombok's @RequiredArgsConstructor does.

So far, easy. Now the parts that look the same and aren't.

Where the analogy breaks (four places)

1. There is no classpath scan. Modules are explicit.

Spring's @ComponentScan walks your classpath, finds every @Component / @Service / @Repository, and registers them. You add a class, you do not edit any wiring. The container finds it.

NestJS does not do that. Every provider has to be listed in a Module:

@Module({
  imports: [DatabaseModule],
  controllers: [UsersController],
  providers: [UsersService, UsersRepository],
  exports: [UsersService],
})
export class UsersModule {}

Add a service, edit the module. Forget to list it, you get a runtime error at boot: Nest can't resolve dependencies of UsersController (?). Please make sure that the argument UsersService at index [0] is available in the UsersModule context. That message is the new "404 on a route you forgot to register".

The upside is that the dependency graph is in source. You can read a Module and see exactly what is wired and what is exported to the rest of the app. There is no startup-time surprise where a classpath-scanned library auto-registered itself. The downside is six lines of bookkeeping per feature module.

2. Decorator metadata is reflective at startup, not annotation processing at compile time.

Spring's annotations are partly compile-time (the annotation processor generates code), partly runtime reflection over class files. The container walks classes once at boot and builds the wiring. Any error that survives compilation usually shows up at startup.

NestJS decorators run in JavaScript at module-load time. They use reflect-metadata to read the constructor parameter types TypeScript emits when emitDecoratorMetadata is on. That has two consequences a Spring engineer should know:

Decorators are TC39 Stage 3 in the language, but NestJS still uses the legacy decorators (the experimentalDecorators flag). The migration to standard decorators has been tracked for a long time on the project's issue tracker. The shipping advice for NestJS 11 in 2026 is still experimentalDecorators: true and emitDecoratorMetadata: true in tsconfig.json. Do not turn either off until NestJS announces it. The runtime DI still depends on the metadata that flag emits.
Type-only injection does not work. If a dependency is imported with import type, the metadata is stripped at compile time and the container has nothing to look up. Use @Inject(SOME_TOKEN) with a string or symbol token when the constructor parameter is typed against an interface or abstract class with no runtime symbol. Spring's container has the bytecode; NestJS only has whatever survived the TypeScript erasure.

3. AOP becomes a small constellation, not one mechanism.

Spring AOP is one feature with many advice types. NestJS splits the same job across four primitives, each with a clearer scope:

Guard — answers "is this request allowed in?". Auth, role checks, feature flags. Returns true / false or throws.
Interceptor — wraps the handler. Logging, caching, transforming the response, tracing spans. Like an @Around aspect.
Pipe — transforms or validates a single argument. The famous one is ValidationPipe, which runs class-validator over the DTO.
ExceptionFilter — converts a thrown error into a response. Like Spring's @ControllerAdvice + @ExceptionHandler.

You apply them with decorators (@UseGuards, @UseInterceptors, @UsePipes, @UseFilters), globally in the bootstrap, on a controller, or on a single handler. The split takes a week to internalize; once you have it, "this is a guard, not an interceptor" becomes a useful design conversation.

4. Persistence is a choice, not a default.

Spring Data JPA gives you a repository interface and the framework writes the queries. NestJS does not pick a persistence layer. The three you'll see in 2026 are TypeORM, Drizzle, and Prisma. TypeORM is the one whose API copies JPA most directly (entities as classes, decorators on fields, a repository pattern). Drizzle is a thin SQL builder with a typed schema; closer to jOOQ in spirit. Prisma is its own thing. It has a separate schema file, a generated client, and a query API that is neither JPA nor SQL.

Pick one early. The temptation to "use whichever fits the feature" is the same temptation that produces a Spring app with three persistence styles, and the same answer applies: don't.

ESM/CJS interop matters in 2026

One Spring habit that does not translate. On the JVM, the classpath is a flat namespace. In Node, the module system has been split between CommonJS and ECMAScript Modules for years, and packages on npm now ship in different combinations.

NestJS 11 itself still ships CJS by default. Some libraries you'll pull in (newer OpenTelemetry packages, node-fetch v3, several test runners) are ESM-only. The mismatch shows up as:

Error [ERR_REQUIRE_ESM]: require() of ES Module ... is not supported.

Three fixes, in order of cost:

Pin the dependency to its last CJS-compatible major.
Use a dynamic import() inside an async function. Node has supported dynamic import() from CommonJS for years.
Migrate the project to ESM ("type": "module" in package.json, .mjs extensions or explicit ones in imports). NestJS supports it; see the official samples for the tsconfig and package.json shape.

NestJS has signaled broader ESM support for a future major. Until that lands, expect to do this dance once per non-trivial project.

Wire your first NestJS module today and the muscle memory transfers in an afternoon. The four breaks above are the only places it won't.

If this was useful

The Java to TypeScript move is full of patterns that look identical and aren't. Kotlin and Java to TypeScript — A Bridge for JVM Developers is the chapter-by-chapter version of this post: nullability, sealed classes to discriminated unions, generics and variance, coroutines mapped to async/await, Spring habits that survive the trip versus the ones that should be retired.

If you want the broader set, the five-book TypeScript Library collection covers the type system, the foundations, the JVM and PHP bridges, and production tooling in one place. Pick the bridge if you're coming from the JVM; add the type-system book once you're past the syntax; the production volume is for anyone shipping TS at work.

TypeScript Essentials — From Working Developer to Confident TS — entry point if you're past the JVM bridge
The TypeScript Type System — From Generics to DSL-Level Types — generics, conditional types, branded types
Kotlin and Java to TypeScript — A Bridge for JVM Developers — this post is one chapter's worth
PHP to TypeScript — A Bridge for Modern PHP 8+ Developers — the other bridge book
TypeScript in Production — Tooling, Build, and Library Authoring — tsconfig, monorepos, dual ESM/CJS, JSR

All five books ship in ebook, paperback, and hardcover.

Go's cmp.Or and cmp.Compare: Three-Way Comparison Without the Boilerplate

Gabriel Anhaia — Tue, 05 May 2026 20:02:59 +0000

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Open the config loader of any Go service older than two years. You'll find this:

port := os.Getenv("APP_PORT")
if port == "" {
    port = cfg.Port
}
if port == "" {
    port = "8080"
}

Or this, on the sort path:

sort.Slice(orders, func(i, j int) bool {
    if orders[i].CreatedAt.Equal(orders[j].CreatedAt) {
        return orders[i].ID < orders[j].ID
    }
    return orders[i].CreatedAt.After(orders[j].CreatedAt)
})

Both shapes work. Both are the kind of thing a code reviewer skims past because the alternative used to be worse. Then Go 1.21 shipped the cmp package with cmp.Compare and cmp.Less. Go 1.22 added cmp.Or. The alternative stopped being worse.

The two functions are tiny. The patterns they unlock are not.

What landed in which release

A short version, because the migration question comes up.

Go 1.21 (August 2023): the cmp package itself, with cmp.Ordered (the constraint), cmp.Compare, and cmp.Less.
Go 1.22 (February 2024): cmp.Or.

If you are on 1.20, none of this exists in the standard library. If you are on 1.21, you have Compare and Less but not Or. A handful of the patterns below depend on Or doing the heavy lifting. The release-note pages on go.dev document both.

`cmp.Compare` — the three-way primitive

The signature is exactly what you expect:

func Compare[T cmp.Ordered](x, y T) int

Returns -1 when x < y, 0 when equal, +1 when x > y. Three-way comparison, the C-style strcmp shape. Finally available without writing the eighteen-line if/else-if/return chain by hand.

The point of three-way comparison is not the function itself. It's that it composes. Two-way comparators (bool) chain through nested if-statements. Three-way comparators chain through cmp.Or.

cmp.Less is the same idea returning a bool, useful when an API specifically wants a less-than predicate. Compare is the one you reach for most.

A small but real edge case: floats. cmp.Compare defines a total order, which means NaN is less than everything else and equal to itself. < and > do not. If your data has any chance of NaN and you sort with < directly, you get unstable orders. cmp.Compare(a, b) gives you a stable order. That alone is a reason to prefer it over hand-rolled comparators in any code that touches user-supplied numbers.

`cmp.Or` — first-non-zero, generalized

The signature is even smaller:

func Or[T comparable](vals ...T) T

Returns the first argument that is not the zero value of T. If all of them are zero, returns the zero value. That is the entire definition.

The first thing it replaces is the config-loading staircase from the top of the post:

port := cmp.Or(
    os.Getenv("APP_PORT"),
    cfg.Port,
    "8080",
)

Three sources, fallback order left to right, one expression. The pattern works for any comparable type: strings, ints, pointer values, struct types whose zero value is meaningful, anything you can compare with ==.

There's a wart worth knowing. cmp.Or compares against the zero value, not nil-or-empty in the general sense. For string that's "", which is what you want. For int that's 0, which means cmp.Or(userSuppliedRetries, 3) returns 3 when the user explicitly asked for zero retries. If zero is a legal user value, cmp.Or is the wrong tool. You need a *int or a "set" flag. Same gotcha with time.Duration defaults. Read the type's zero behavior before reaching for Or.

Pattern 1: configuration with a clear precedence chain

This is the most common use and the one that makes the package pay for itself in the first afternoon.

type Config struct {
    DatabaseURL string
    Port        string
    LogLevel    string
}

func loadConfig(env Env, fileCfg Config) Config {
    return Config{
        DatabaseURL: cmp.Or(
            env.DATABASE_URL,
            fileCfg.DatabaseURL,
            "postgres://localhost:5432/app",
        ),
        Port: cmp.Or(env.PORT, fileCfg.Port, "8080"),
        LogLevel: cmp.Or(
            env.LOG_LEVEL,
            fileCfg.LogLevel,
            "info",
        ),
    }
}

The shape reads top to bottom: env wins, then file, then default. Adding a fourth source (say a feature-flag service) is one more argument. No new branch. The comparison with the staircase pattern is not subtle: ten lines collapse to one per field, and the precedence is in the order of arguments instead of buried in the order of if blocks.

What it does NOT replace is validation. cmp.Or("", "", "") returns "". If empty is invalid for the field, you still need to check the result. The package gives you fallback, not enforcement.

Pattern 2: multi-key sorting that reads like the spec

This is where cmp.Compare and cmp.Or work together, and where the API earns its keep.

You have a paginated API for orders. Sort key: most recent first, ties broken by lowest ID first. The pre-cmp version was a chained ternary or a nested-if comparator. The post-cmp version is exactly the spec, in code:

type Order struct {
    ID        int64
    CreatedAt time.Time
    Total     int
}

slices.SortFunc(orders, func(a, b Order) int {
    return cmp.Or(
        b.CreatedAt.Compare(a.CreatedAt), // desc by date
        cmp.Compare(a.ID, b.ID),          // asc by ID
    )
})

time.Time already has a three-way Compare method (added in 1.20), which is why we don't need cmp.Compare for the date. The b.CreatedAt.Compare(a.CreatedAt) flips the sign so newer-is-first.

cmp.Or returns the first non-zero result. If two orders have different timestamps, the date comparison decides and the ID comparison is never evaluated. If they share a timestamp, the date comparison returns zero and cmp.Or falls through to the ID comparison. Short-circuit evaluation, no manual nesting.

Adding a third tiebreaker (say, total descending) is one more line:

slices.SortFunc(orders, func(a, b Order) int {
    return cmp.Or(
        b.CreatedAt.Compare(a.CreatedAt),
        cmp.Compare(b.Total, a.Total),
        cmp.Compare(a.ID, b.ID),
    )
})

Three keys, three lines, and the order of the lines is the order of the sort priorities. Most Go code did not have it for sorting, for years.

Pattern 3: paginated cursor-based queries

The reason multi-key sort matters in production is that cursor pagination is built on it. If you paginate by (created_at DESC, id ASC), every page query needs to filter WHERE (created_at, id) < (cursor.created_at, cursor.id) and order the same way. The Go side of that comparison — the test that decides whether a row sits before or after the cursor — is the same cmp.Or chain you just wrote for the sort.

func afterCursor(o Order, c Cursor) bool {
    return cmp.Or(
        o.CreatedAt.Compare(c.CreatedAt),
        cmp.Compare(o.ID, c.ID),
    ) < 0
}

The function reads as the same data dependency the SQL has. When the sort changes (product asks to add a tiebreaker on customer-priority) you change one place. Not two places. The pre-cmp version drifted between the SQL ORDER BY and the Go-side cursor check. The bug only showed up at page boundaries when two rows shared a timestamp.

Pattern 4: stable IDs from optional fields

You have an event with three possible identifiers and want a stable, non-empty ID for logging. Whichever the upstream sent, in priority order:

func eventID(e Event) string {
    return cmp.Or(
        e.IdempotencyKey,
        e.RequestID,
        e.TraceID,
        fmt.Sprintf("synthesized-%d", time.Now().UnixNano()),
    )
}

The synthesized fallback is the safety net so the function never returns empty. The rule is visible in one glance instead of reconstructable from four if statements.

When NOT to reach for `cmp.Or`

The line a lot of teams get wrong is treating cmp.Or as a general fallback operator. It is not. It compares against the zero value of T, full stop. The cases it does not cover:

Pointers where nil and "explicit zero" both matter. cmp.Or((*int)(nil), &x) returns &x, which is what you want. cmp.Or(&zero, &x) returns &zero. Both pointers are non-nil, the first wins, the function never reads what they point at. If the rule is "non-nil pointer to a non-zero value," you need a helper, not Or.
Slices, maps, channels. They are not comparable. cmp.Or will not compile. The stdlib has nothing for "first non-empty slice" — write a three-line helper if you need it.
Anything where zero is a legal user choice. cmp.Or(retries, 3) silently overrides the user's retries=0. The fix is the explicit-set pattern: a pointer field, an omitempty-aware decoder, or a (value, ok) shape.

The mental model: cmp.Or is the typed, generalized coalesce for cases where zero means unset. Anywhere "zero means unset" is true for your type, it works. Anywhere it isn't, reach for something else.

A short decision rule

The two functions answer two questions. Match the question, the function picks itself.

"Which of these candidates do I use?" → cmp.Or over a list, zero means unset.
"How do these two values order?" → cmp.Compare returning the three-way int.
"How do these two records order, by multiple keys?" → cmp.Or of cmp.Compare calls, one per key, in priority order.

The pattern that ties them together is the one to internalize: three-way comparators chain through cmp.Or. Two-way comparators do not. That is the reason both functions ship in the same package, and the reason the whole shape only fully arrived in 1.22 with Or filling in the gap.

If this was useful

The cmp package is one of the shorter chapters in The Complete Guide to Go Programming, but the patterns above show up across most chapters — sort keys in collections, fallback chains in config loaders, three-way comparison in custom types. The book covers what each piece of the standard library actually does and which of your habits from other languages translate. Hexagonal Architecture in Go picks up on the architectural side, including how comparison and ordering decisions belong inside the domain layer rather than scattered across adapters.

Reading -gcflags='-m=2' Output: What the Go Compiler Tells You About Inlining

Gabriel Anhaia — Tue, 05 May 2026 20:02:24 +0000

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You have a Go service that is "fast enough" until the day a profile says otherwise. Someone on the team says PGO will fix it. Someone else says the function should already be inlined. A third person says the compiler probably can't inline through the interface call. The tool that settles all three sits in the toolchain you've been ignoring: -gcflags='-m=2'.

The flag is older than most Go careers, and the output is unforgiving. The lines look like the compiler muttering to itself, because in a sense it is — those are the inliner's notes about what got inlined and what didn't, plus the escape-analysis decisions that share the stream. Reading them once changes how you write Go forever, mostly because you stop guessing.

What the flag actually does

-gcflags='-m' asks the compiler to print its optimization decisions. -m=2 doubles the verbosity: it adds the reasons the inliner kept or rejected each call, plus the inlining-cost budget for every function it considered. -m=3 exists and adds devirtualization decisions and PGO call-site weights, but you rarely need it on a normal day.

Run it on the smallest possible package:

go build -gcflags='-m=2' ./pkg/cache 2>&1 | head -40

The output mixes three kinds of lines. Once you can tell them apart, the rest is mechanical.

./cache.go:14:6: can inline (*Cache).Get with cost 18 as: ...
./cache.go:23:6: cannot inline (*Cache).Set: function too complex (cost 102 exceeds budget 80)
./cache.go:31:7: inlining call to (*Cache).Get
./cache.go:42:13: parameter k leaks to {storage for ...}
./cache.go:55:9: moved to heap: buf
./cache.go:60:6: devirtualizing fn.(io.Writer).Write to *bytes.Buffer

can inline X with cost N is the inliner saying this function is small enough to be a candidate. The cost is a proxy for AST node count; the inliner's default budget is around 80 in recent releases, which is why "cost 18" is a green light and "cost 102" is the rejection. inlining call to X is the inliner doing it at a specific call site. moved to heap: x is escape analysis admitting a local has to live longer than its frame. devirtualizing fn is PGO turning an interface call into a direct one.

Three different passes share a single output stream with nothing separating them. That is most of what makes -m=2 look impenetrable.

Pattern 1: PGO devirtualization, made visible

PGO has a user-facing story (collect a profile, rebuild, watch the binary get faster). -m=2 is where you watch it work.

Take a small program with an interface call site that almost always points at the same concrete type:

package main

import (
    "fmt"
    "os"
)

type Writer interface {
    Write(p []byte) (int, error)
}

type counter struct {
    n int
}

func (c *counter) Write(p []byte) (int, error) {
    c.n += len(p)
    return len(p), nil
}

func emit(w Writer, msg string) {
    w.Write([]byte(msg))
}

func main() {
    var w Writer = &counter{}
    for i := 0; i < 1_000_000; i++ {
        emit(w, "x\n")
    }
    fmt.Fprintln(os.Stderr, "done")
}

Without PGO, the call inside emit goes through the interface's dispatch table on every iteration. Build with -m=2:

go build -gcflags='-m=2' ./...

You will see can inline emit with cost ... but no devirtualizing line on w.Write. The compiler does not know which concrete type w is at compile time.

Now collect a CPU profile from a run, drop it next to the package as default.pgo, rebuild with -m=2. The line you are looking for shows up:

./main.go:18:9: devirtualizing w.Write to *main.counter

That is the inliner saying the profile told me this call site is dominated by *counter, so I am replacing the interface dispatch with a direct call to (*counter).Write plus a fallback. Once the call is direct, the inliner can reason about whether to inline it. With a small concrete method like the one above, you may also see inlining call to (*counter).Write on the next line. Devirtualization opens the door; whether inlining follows depends on the method's cost.

Two things to watch. The PGO devirtualizer is conservative about which sites it touches; the Go PGO guide notes that devirtualization fires only when one type accounts for a large share of the call site's traffic in the profile. And the fallback branch still pays a comparison on the type, so devirtualization speeds up the hot case at a small cost to the cold one. If your interface call is roughly 50/50 between two types, PGO will leave it alone and you will see no devirtualizing line.

Pattern 2: closures and the escape-analysis surprise

Write a function that builds a closure, and you have written something the inliner and the escape analyzer have to agree on. They often do not.

package count

func makeAdder(x int) func(int) int {
    return func(y int) int {
        return x + y
    }
}

func sum(n int) int {
    add := makeAdder(10)
    total := 0
    for i := 0; i < n; i++ {
        total = add(i)
    }
    return total
}

Build with -m=2:

./count.go:3:6: can inline makeAdder with cost 10 as: ...
./count.go:4:9: can inline makeAdder.func1 with cost 6 as: ...
./count.go:4:9: func literal escapes to heap
./count.go:3:14: moved to heap: x
./count.go:9:6: can inline sum with cost 31 as: ...
./count.go:10:18: inlining call to makeAdder
./count.go:10:18: inlining call to count.makeAdder.func1
./count.go:13:13: inlining call to count.makeAdder.func1

The interesting part is the third line. func literal escapes to heap means the closure value is heap-allocated. The fourth line, moved to heap: x, means the captured variable is too. Both are forced because, in general, a returned closure outlives the frame that built it. The escape analyzer cannot reason about the specific call site in sum where the closure obviously does not escape — that level of context-sensitivity is beyond what the analyzer attempts.

But notice the inlining call to makeAdder further down. Once the inliner inlines makeAdder into sum, the closure and its captured x are no longer crossing a frame boundary, and a later pass can stack-allocate them. Whether it actually does depends on the Go version and the surrounding context, which is why you check with -m=2 and -m=3 instead of guessing.

The rule of thumb that survives most refactors: closures returned from one function and called inside the same package, in a tight loop, can avoid heap allocation if the compiler decides to inline the constructor. If you write the same closure pattern across package boundaries, the inliner is much more likely to give up, and the heap cost stays. -m=2 is the cheapest way to confirm which path the compiler took on your code.

Pattern 3: -l=4 and the "deep inlining" question

-l=4 is the flag that makes the inliner more aggressive. It raises the cost budget and enables mid-stack inlining of larger functions. It also has to be the right answer to a specific question, because turning it on globally is rarely free.

go build -gcflags='-m=2 -l=4' ./...

You will see more inlining call to X lines, and some functions that previously printed cannot inline F: function too complex will now inline. On a CPU-bound microbenchmark with a small hot loop, this can show up as a measurable speedup. On a service binary, it shows up as larger compiled output, longer compile times, and sometimes a slowdown from instruction-cache pressure.

There are two situations where -l=4 earns its keep.

The first is when you have profiled a hot path, identified a wrapper function that the default budget refuses to inline, and you want to confirm that the wrapper is the bottleneck before you rewrite it. Build the package with -l=4, run the benchmark, and compare to the default build. If the win is significant, you have your answer about the wrapper. If the win is zero, the wrapper was not the problem and you have saved yourself a rewrite.

The second is when you ship a tight library (think serializer, hash, or parser) where the user's hot path threads through a chain of small helpers that each individually fit the budget but whose chain does not. -l=4 lets the chain collapse into a single inlined body. The Go standard library's encoding/binary and parts of runtime are built this way; your own library can be too, when it is the right shape.

A few traps. -l=4 can amplify the binary-size cost more than the speed cost — you can find yourself adding several percent to the binary for a fractional speedup, which is rarely the right trade. It can also push a function over the inliner's recursion-detection thresholds and produce less inlining than the default would. Always compare -m=2 output between the two builds, not just the benchmark numbers. The number is the headline; the -m=2 diff is the explanation.

How to read a real `-m=2` session

The workflow that survives the most ad-hoc questions:

Pick the smallest package that contains the hot path. Don't run -m=2 on the whole module — the output buries you.
Build with go build -gcflags='-m=2' ./pkg/... and pipe to a file. The file is the artifact you grep, not the terminal.
Grep for inlining call to and the names of the functions you expect to be hot. Confirmed inlines are a good sign. Missing inlines are a question.
Grep for cannot inline to find the rejected candidates. The reason is on the same line: function too complex, recursive, marked go:noinline, or a rejection tied to a runtime/unsafe call. The first two are actionable. The last two usually are not.
Grep for moved to heap, escapes to heap, parameter ... leaks on the function in question. These are escape-analysis decisions, separate from inlining, but they share the output stream because they share a pass.
If PGO is on, grep for devirtualizing to confirm the profile actually changed call sites. No devirtualizing lines mean your profile did not concentrate enough traffic on one concrete type at any call site to clear the threshold.

The output has not changed shape much across recent Go versions, and the inliner-cost numbers (with cost N) are stable enough across releases to read year over year. Treat the format as documentation by example: the cmd/compile source is the source of truth when a line confuses you, and grepping the inliner package for the message string lands on the case in the compiler that emitted it.

What changes about how you write Go

Reading -m=2 once a quarter, on the package that pprof points at, changes a few habits. You stop reaching for interface{} parameters in hot paths. You write smaller helper functions so the inliner has more it can chain. You stop writing var foo = func() ... at package scope when a plain function would do.

The flag is also the only honest way to test a PGO profile. If a profile is supposed to inline a hot wrapper or devirtualize an interface call, the -m=2 output is where the proof lives. If the lines are not there, the profile is not doing what you thought.

The compiler is willing to explain itself. Most Go developers never ask.

If this was useful

The compiler's mental model covers three things: inlining budgets, escape analysis, and call-site devirtualization. It's one of the parts of Go that most production engineers learn from a stack of blog posts and never assemble into a coherent picture. The Complete Guide to Go Programming walks through the inliner, the escape analyzer, and the runtime that depends on both, in the order you actually need them when you are reading a profile.

The companion book, Hexagonal Architecture in Go, sits one layer up: how to structure a service so the hot paths you eventually optimize are isolated from the domain code that should never need to think about closures or inlining cost.

iter.Seq in Go 1.23+: The Iterator Type Behind range-over-func

Gabriel Anhaia — Tue, 05 May 2026 20:01:56 +0000

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Range-over-func is the language feature everyone wrote about in Go 1.23. iter.Seq[V] is the type your code is supposed to pass around. The standard library quietly grew an ecosystem to feed it, drain it, sort it, and chunk it.

The whole iter package fits on one screen — two type aliases and two helpers:

package iter

type Seq[V any]      func(yield func(V) bool)
type Seq2[K, V any]  func(yield func(K, V) bool)

func Pull[V any](seq Seq[V]) (next func() (V, bool),
                              stop func())
func Pull2[K, V any](seq Seq2[K, V]) (next func() (K, V, bool),
                                      stop func())

Two type aliases for callback-shaped functions and two helpers that flip from push to pull.

What makes the package matter is the rest of the standard library that grew up around it. Once a function returns iter.Seq[T], the slices, maps, bytes, and strings packages have helpers ready to feed it, drain it, sort it, chunk it. The shape your code passes around is iter.Seq[T].

What `iter.Seq[V]` actually is

iter.Seq[int] is func(yield func(int) bool). Nothing more.

A producer:

package main

import (
    "fmt"
    "iter"
)

func Count(start, n int) iter.Seq[int] {
    return func(yield func(int) bool) {
        for i := 0; i < n; i++ {
            if !yield(start + i) {
                return
            }
        }
    }
}

func main() {
    for v := range Count(10, 5) {
        fmt.Println(v) // 10 11 12 13 14
    }
}

The for-range form is sugar. The same iterator works as a value you pass around:

seq := Count(10, 5)         // iter.Seq[int]
total := 0
seq(func(v int) bool {      // call it directly
    total += v
    return true
})
fmt.Println(total)          // 60

for v := range seq is the readable form. Calling seq(yield) directly is what standard-library helpers do internally.

The 1.23 ecosystem you compose against

slices and maps shipped twelve iterator-aware functions in 1.23 (Go 1.23 release notes).

Producers (slice/map → iter.Seq):

// Signatures from the stdlib — not a runnable block:
slices.All[Slice ~[]E, E any](s Slice) iter.Seq2[int, E]
slices.Values[Slice ~[]E, E any](s Slice) iter.Seq[E]
slices.Backward[Slice ~[]E, E any](s Slice) iter.Seq2[int, E]
slices.Chunk[Slice ~[]E, E any](s Slice, n int) iter.Seq[Slice]

maps.All[Map ~map[K]V, K comparable, V any](m Map) iter.Seq2[K, V]
maps.Keys(m) iter.Seq[K]
maps.Values(m) iter.Seq[V]

Consumers (iter.Seq → slice/map):

// Signatures from the stdlib — not a runnable block:
slices.Collect[E any](seq iter.Seq[E]) []E
slices.AppendSeq(s, seq) Slice
slices.Sorted[E cmp.Ordered](seq iter.Seq[E]) []E
slices.SortedFunc(seq, cmp) []E
slices.SortedStableFunc(seq, cmp) []E

maps.Collect(seq iter.Seq2[K, V]) map[K]V
maps.Insert(m, seq iter.Seq2[K, V])

Go 1.24 added the bytes and strings siblings: Lines, SplitSeq, SplitAfterSeq, FieldsSeq, and FieldsFuncSeq. All return iter.Seq[[]byte] or iter.Seq[string], so string parsing chains into the same pipeline (Go 1.24 release notes).

A short example wiring four of them together:

// requires Go 1.24+ for strings.SplitSeq
import (
    "maps"
    "slices"
    "strings"
)

words := strings.SplitSeq("go go go iter seq go", " ")
counts := map[string]int{}
for w := range words {
    counts[w]++
}

top := slices.SortedStableFunc(
    slices.Collect(maps.Keys(counts)),
    func(a, b string) int {
        return counts[b] - counts[a]
    },
)
fmt.Println(top) // [go iter seq] (iter and seq tie at 1; SortedStableFunc keeps insertion order)

SplitSeq is the producer. maps.Keys turns the count map into another iter.Seq. The sort runs over the collected keys and returns a sorted slice. No intermediate slice from the original split, just iterator nodes wired by type.

A paginated API client that yields `iter.Seq[Item]`

This is the shape iterators were quietly built for. Cursor-paginated APIs return one page at a time. The caller wants one stream of items. Before 1.23 you wrote a closure that returned (Item, bool, error) or you allocated everything into a slice. Both leak the pagination into the caller.

The iterator version reads top-down. The shape below — a (iter.Seq[Item], func() error) pair — is the same one bufio.Scanner uses (Scan() plus Err()):

package pages

import (
    "context"
    "encoding/json"
    "fmt"
    "iter"
    "net/http"
    "net/url"
)

type Item struct {
    ID   string `json:"id"`
    Name string `json:"name"`
}

type page struct {
    Items      []Item `json:"items"`
    NextCursor string `json:"next_cursor"`
}

type Client struct {
    HTTP    *http.Client
    BaseURL string
}

func (c *Client) Items(
    ctx context.Context,
) (iter.Seq[Item], func() error) {
    var fetchErr error
    seq := func(yield func(Item) bool) {
        cursor := ""
        for {
            p, err := c.fetch(ctx, cursor)
            if err != nil {
                fetchErr = err
                return
            }
            for _, it := range p.Items {
                if !yield(it) {
                    return
                }
            }
            if p.NextCursor == "" {
                return
            }
            cursor = p.NextCursor
        }
    }
    errFn := func() error { return fetchErr }
    return seq, errFn
}

func (c *Client) fetch(
    ctx context.Context,
    cursor string,
) (*page, error) {
    u := c.BaseURL + "/items"
    if cursor != "" {
        u += "?cursor=" + url.QueryEscape(cursor)
    }
    req, err := http.NewRequestWithContext(
        ctx, http.MethodGet, u, nil)
    if err != nil {
        return nil, err
    }
    resp, err := c.HTTP.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("status %d",
            resp.StatusCode)
    }
    var p page
    if err := json.NewDecoder(resp.Body).Decode(&p); err != nil {
        return nil, err
    }
    return &p, nil
}

Two choices worth flagging.

The return type is a (iter.Seq[Item], func() error) pair rather than iter.Seq2[Item, error]. Both shapes work, and both have the same forget-the-check footgun: if the caller ignores the error path (the Err() closure here, the second range variable for Seq2), errors are dropped silently. The Seq2 form is more compact at the call site; the pair form follows bufio.Scanner and keeps the error out of every loop iteration. Pick whichever your team will remember to check, and document it.

The yield-return-on-false check inside the inner loop is structural. Once the consumer breaks, yield returns false and the iterator function returns. No bonus page after the consumer asked it to stop.

The call site:

items, errFn := client.Items(ctx)
for it := range items {
    if it.Name == "" {
        continue
    }
    process(it)
}
if err := errFn(); err != nil {
    return err
}

The HTTP work is hidden. The pagination is hidden. The consumer reads as if it had a slice.

A filter + map + take pipeline

The other half of iter.Seq's value is composition. The standard library does not ship Filter, Map, or Take helpers — that is on you. Each is a one-liner.

package xiter

import "iter"

func Filter[V any](
    seq iter.Seq[V],
    keep func(V) bool,
) iter.Seq[V] {
    return func(yield func(V) bool) {
        for v := range seq {
            if !keep(v) {
                continue
            }
            if !yield(v) {
                return
            }
        }
    }
}

func Map[V, R any](
    seq iter.Seq[V],
    fn func(V) R,
) iter.Seq[R] {
    return func(yield func(R) bool) {
        for v := range seq {
            if !yield(fn(v)) {
                return
            }
        }
    }
}

func Take[V any](
    seq iter.Seq[V],
    n int,
) iter.Seq[V] {
    return func(yield func(V) bool) {
        if n <= 0 {
            return
        }
        i := 0
        for v := range seq {
            if !yield(v) {
                return
            }
            i++
            if i >= n {
                return
            }
        }
    }
}

Each is a one-liner over iter.Seq. Filter and Take preserve type; Map transforms it. All three return iter.Seq so they compose.

Take deserves a closer look. The cap check sits after the yield and the increment, so Take(seq, 10) pulls exactly 10 items from upstream — not 11. If you put the check at the top of the loop body, you re-enter the loop after the tenth yield, pull one more item from upstream, then return. For a paginated client where item 11 forces a second page request, that is a wasted round-trip.

Plug them into the paginated client:

import (
    "slices"
    "strings"
)

items, errFn := client.Items(ctx)

names := xiter.Map(
    xiter.Take(
        xiter.Filter(
            items,
            func(it Item) bool { return it.Name != "" },
        ),
        10,
    ),
    func(it Item) string {
        return strings.ToUpper(it.Name)
    },
)

result := slices.Collect(names)
if err := errFn(); err != nil {
    return err
}

Read it bottom-up. The pages stream in. Filter drops items with empty names. Take stops the chain at ten. Map upper-cases each. slices.Collect materialises one slice of ten strings.

Once Take has counted ten yields, it returns. Filter sees yield return false and returns itself. The iterator inside client.Items returns before the next page request goes out. One goroutine, one page in memory at a time, no extra fetch beyond the cap.

When to reach for `iter.Pull`

Push iterators ranged with for v := range seq cover most cases. iter.Pull is for code that needs to drive consumption from a place a for-loop body cannot reach: a state machine, an io.Reader adapter, a merger that interleaves two sequences.

items, errFn := client.Items(ctx)
next, stop := iter.Pull(items)
defer stop()

for {
    item, ok := next()
    if !ok {
        break
    }
    if !decideAndStash(item) {
        return
    }
}
if err := errFn(); err != nil {
    return err
}

iter.Pull runs the push iterator on a coroutine and gives you back synchronous next and stop. The cost is the second goroutine. That's tolerable in outer loops; it adds up in inner loops over millions of elements. Reach for it when the for-range form bends the surrounding code awkwardly. See the iter package docs for the coroutine semantics and the Go blog on range-over-func for the design discussion.

If this was useful

The iterator vocabulary is one of the bigger reorientations Go has shipped since generics. The Complete Guide to Go Programming covers iter.Seq, the standard-library helpers in slices and maps, and the patterns above — paginated clients, transducer pipelines, when to switch to iter.Pull — alongside the rest of the language top to bottom.

The Complete Guide to Go Programming — the book this post draws from: xgabriel.com/go-book
Hexagonal Architecture in Go — the other half of Thinking in Go: xgabriel.com/hexagonal-go
Hermes IDE — an IDE for developers who ship with Claude Code and other AI coding tools: hermes-ide.com
More posts and contact — xgabriel.com

Forem: Gabriel Anhaia

Embeddings on the Edge: sentence-transformers vs Hosted APIs

What counts as local in 2026

Pricing snapshot, end of April 2026

The crossover formula

Where local actually wins

Where hosted wins (most of the time)

A real serving setup

A pragmatic decision tree

If this is useful

Building a Plugin System in Go Without `plugin`: 3 Patterns That Actually Ship

A common interface, three implementations

Pattern 1: compile-time registration

Pattern 2: subprocesses over gRPC

Pattern 3: WebAssembly via wazero

How to pick

If this was useful

Stage 3 vs Legacy TypeScript Decorators in a NestJS App

Two features, one syntax

The legacy signature

The stage-3 signature

Three real ports: logging, validation, DI

The NestJS migration map

If this was useful

Generic-Heavy APIs Hurt Compile Times. Here Is How to Measure It

The shape Go's compiler picks

Measuring the actual cost

Pattern 1: do not propagate the parameter past the boundary

Pattern 2: constrain to one concrete type when the generic is theatre

Pattern 3: use any deliberately at known-broad seams

What the bill looks like after

If this was useful

Mocking ESM in 2026: Vitest, Bun, and Node's mock.module

What broke

Pattern 1: hoisted module mock

Pattern 2: per-test partial mock

Pattern 3: dynamic-import-based mocking

Three patterns to retire

A rule that survives all three

If this was useful

LLM Response Caching: When the 80/20 Hit Rate Saves the Bill

When the 80/20 actually shows up

The keying strategy is the whole design

The 70-line implementation

The cost math, with hedges

What to instrument

What to try on Monday

If this was useful

Stop Wrapping Errors in TypeScript. Use `cause` Instead.

The three patterns to delete

What cause actually does

What the runtime preserves

The catch-shape gotcha

Where the chain still leaks

The migration is mechanical

If this was useful

The 7 Anthropic API Errors That Mean Different Things in Production

The shape of an error response

1. rate_limit_error (429): retry with jitter, not a fixed sleep

2. overloaded_error (529): back off longer, fail over if you have a fallback

3. invalid_request_error (400): never retry, fix the input

4. authentication_error (401): never retry, page someone

5. permission_error (403): the key works, your model access does not

6. not_found_error (404): your model name is wrong

7. api_error (500): retry once, then escalate

Putting it together

If this was useful

Spring Boot to NestJS: A Mental Model for Java Developers

Spring to NestJS, term by term

A small Spring controller, line by line in NestJS

Where the analogy breaks (four places)

1. There is no classpath scan. Modules are explicit.

2. Decorator metadata is reflective at startup, not annotation processing at compile time.

3. AOP becomes a small constellation, not one mechanism.

4. Persistence is a choice, not a default.

ESM/CJS interop matters in 2026

If this was useful

Go's cmp.Or and cmp.Compare: Three-Way Comparison Without the Boilerplate

What landed in which release

cmp.Compare — the three-way primitive

Pattern 3: use `any` deliberately at known-broad seams

What `cause` actually does

1. `rate_limit_error` (429): retry with jitter, not a fixed sleep

2. `overloaded_error` (529): back off longer, fail over if you have a fallback

3. `invalid_request_error` (400): never retry, fix the input

4. `authentication_error` (401): never retry, page someone

5. `permission_error` (403): the key works, your model access does not

6. `not_found_error` (404): your model name is wrong

7. `api_error` (500): retry once, then escalate

`cmp.Compare` — the three-way primitive

`cmp.Or` — first-non-zero, generalized

When NOT to reach for `cmp.Or`

How to read a real `-m=2` session

What `iter.Seq[V]` actually is

A paginated API client that yields `iter.Seq[Item]`

When to reach for `iter.Pull`