Forem: Kaio Cunha

dotbabel `/handoff`: portable context across Claude Code, Codex, Copilot CLI, and Gemini CLI

Kaio Cunha — Wed, 06 May 2026 14:38:27 +0000

Update (2026): The project has been renamed from dotclaude to dotbabel to reflect its model-agnostic positioning. v1.x setups continue to work via a one-release-window read-fallback compat layer (~/.config/dotclaude/, DOTCLAUDE_* env vars, etc.); compat shims are removed in 3.0.0. Migration guide: https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md

One skill from the dotbabel project, and how it solved cross-CLI session transfer.

It happened on a Tuesday.

I was four hours into a careful refactor of a GraphQL gateway in Claude Code. The kind of session where you've walked the model through three layers of internal context, agreed on a strategy, and started touching files. The plan was tight, the momentum was real.

Then I hit the context limit.

Claude told me to pick it up later with claude --resume <some-uuid>. Codex was already open in the next tmux pane, idle. I had three options:

Paste a 14k-character transcript into Codex and cross my fingers.
Ask Claude to "summarize this for the next agent," then watch it omit the load-bearing details.
Start over from scratch in Codex and waste the morning.

None of them were good. So I built /handoff. It's why I keep dotbabel installed on every machine I work on.

A word on dotbabel

If you read my earlier piece on dotbabel, you already know the project: an MIT-licensed governance layer for Claude Code, with a portable skills library on one side and a CI-friendly validation CLI on the other. That post covered the architecture and motivation. This one zooms into a single skill from the library.

The premise: skills travel with you across machines. That's the whole point of dotbabel path 1. Conversations didn't travel with them. You'd open a fresh CLI on a new machine and lose every working assumption from the last session. /handoff closes that gap with three verbs and a private git repo as transport.

Three verbs in sixty seconds

dotbabel handoff pull <id>             # render a local session as markdown
dotbabel handoff push --from <cli>     # ship a session to a private git repo
dotbabel handoff fetch <id>            # grab a session from any other machine

Those three verbs are the entire surface. pull is local-only: it reads a session transcript from disk and emits a <handoff> block you can paste anywhere. push and fetch use a private git repo as transport, so you can move context across machines without standing up new infrastructure.

One note on push arguments. --from <cli> is required only when no <query> is given, since the tool needs to know whose latest to ship. With an explicit <query> (UUID, short UUID, alias, or latest), --from is optional and acts as a filter that narrows the resolver to one CLI's sessions.

I'll call the rendered output the digest for the rest of this article. It's the thing all three verbs operate on.

A small note on invocation

Claude Code is the primary host for dotbabel skills. It autoloads ~/.claude/skills/, so /handoff is available as a native slash command from the moment the binary is installed. Inside Claude Code I just type /handoff push --from claude.

Other CLIs aren't there yet. Codex, Copilot, and Gemini don't autoload the skill manifest, so you call the underlying binary directly via the CLI's bash escape:

!dotbabel handoff push --from gemini

Same code path, same behavior, slightly more typing. Native slash-command support for Codex, Copilot, and Gemini is on the roadmap; for now, the ! prefix is the contract. For brevity, the rest of this article uses the bare dotbabel handoff … form. Prepend ! if you're calling from inside a non-Claude CLI.

Prerequisites

npm install -g @dotbabel/dotbabel

That covers the local-only path. pull works the moment the binary is installed: no network, no auth, no config.

For cross-machine work, you need a private git repo and one environment variable:

export DOTBABEL_HANDOFF_REPO=git@github.com:you/handoff-store.git

Or skip the manual setup. The first time you run push, dotbabel detects an unset DOTBABEL_HANDOFF_REPO, checks whether gh is authenticated, offers to create a private repo for you, and persists the URL to ~/.config/dotbabel/handoff.env. The whole bootstrap is one yes-or-no prompt.

Verify your setup any time:

dotbabel handoff doctor

You're looking for ok and a non-empty DOTBABEL_HANDOFF_REPO. Anything else, the doctor prints a structured remediation block telling you exactly what's wrong and how to fix it.

Walkthrough: local handoff

Say I want to move my current Claude Code session into Codex. No transport repo needed for that case: same machine, same filesystem.

dotbabel handoff pull latest --from claude

This finds my most recent Claude session, extracts the user prompts and the last few assistant turns, and prints a digest to stdout:

<handoff origin="claude" session="a1b2c3d4" cwd="/home/dev/projects/gateway" target="claude">

**Summary.** Session opened with: "/refactor the resolver layer to use dataloaders".
Last assistant output (truncated): "Approved. Applying the changes to resolvers/user.ts".
Full prompt log and assistant tail follow for context.

**User prompts (last 10, in order).**

1. /refactor the resolver layer to use dataloaders
2. Show me the existing resolver shape first
3. Why are we batching by tenant_id and not user_id?
…

**Last assistant turns (tail).**

> The current resolver hits the DB once per request. Batching by tenant…
> Plan: introduce a DataLoader keyed on (tenant_id, user_id) and migrate…
> Approved. Applying the changes to resolvers/user.ts

**Next step.** Continue from the last assistant turn using the same file scope and goals summarized above.

</handoff>

The <handoff> tag is deliberate: it's a machine-readable marker that lets a receiving agent detect the digest and treat it as a task specification with explicit scope.

Three variants worth knowing:

# Same digest, but written to a markdown file under docs/handoffs/
dotbabel handoff pull <id> -o auto

# A terser prose summary, useful when you just want to remember what a session was about
dotbabel handoff pull <id> --summary

# Specific output path
dotbabel handoff pull <id> -o /tmp/handoff.md

Resolving an ID

<id> accepts more than UUIDs. The resolver tries, in order: full UUID → short UUID (the first 8 hex chars) → the literal latest → a deliberate-label alias. Aliases are case-insensitive and come from whatever the source CLI calls them:

Claude's customTitle or aiTitle (set with claude --resume "my-feature").
Codex's thread_name (set with codex resume <name>).
Copilot's workspace.yaml:name.
Gemini's checkpoint-<tag>.json (set with /chat save <tag> inside the session).

Aliases are why the workflow stays out of my way. I rename my Claude session gateway-refactor, walk to my desktop the next morning, run dotbabel handoff fetch gateway-refactor, and it works. No UUID copy-paste, no scrolling through directory listings.

Walkthrough: across machines

Now the scenario that motivated the whole skill: moving a session from my laptop to my desktop.

On the laptop, before I close my coffee shop tab:

dotbabel handoff push --from claude --tag end-of-day

The output is short and useful:

handoff/gateway/claude/2026-05/a1b2c3d4
git@github.com:you/handoff-store.git
handoff:v2:gateway:claude:2026-05:a1b2c3d4:laptop-mbp:end-of-day
[scrubbed 0 secrets]

The first line is the canonical branch name. The shape is intentional:

handoff/<project>/<cli>/<YYYY-MM>/<short-id>

It's namespaced by project (derived from the session's git root), then origin CLI, then year-month, then the 8-hex short id. The structure does double duty as a collision domain. Two sessions in the same project, same CLI, same month, with the same short-id prefix would clash, and the binary's collision probe catches that before any push lands.

On the desktop, an hour later:

dotbabel handoff fetch a1b2c3d4

fetch clones just the one branch, reads handoff.md from its tip, and prints it to stdout. Same digest I produced on the laptop. I paste it into a fresh Claude, Codex, or Gemini session, and the new agent picks up where the old one left off, with the file scope and plan intact.

You can list and search before fetching:

dotbabel handoff list --remote --limit 10
dotbabel handoff search "dataloader"

The list view shows location, CLI, short id, and timestamp. Search runs a substring/regex match against the digest content. Lucene this is not, but it's good enough to find a specific session by a phrase you remember from the prompt log.

The redaction pass

Here's the part I wasn't willing to hand-wave. The digest is plaintext markdown going to a remote git repo. If I accidentally pasted an API key into a session three days ago, that key is in the transcript. If push doesn't strip it, my secrets manager just got bypassed by a developer-experience tool.

So push runs the digest through a redaction script before it ever leaves the machine. The script operates on stdin, applies eight regex passes, and emits the redacted text plus a scrubbed:<N> count on stderr. Eight things go through:

GitHub tokens (ghp_…, gho_…, ghs_…)
OpenAI / Anthropic-style keys (sk-…)
AWS access keys (AKIA…)
Google API keys (AIza…)
Slack tokens (xox[baprs]-…)
HTTP Authorization: Bearer … headers
Environment variable assignments matching *TOKEN, *KEY, *SECRET, *PASSWORD
PEM private-key block headers

The design constraint is "fail closed." If the script can't run for any reason (missing perl, I/O error, malformed input), the push aborts with an error and nothing reaches the remote. There's no --skip-scrub flag. There never will be.

The skill itself reinforces this. Look at skills/handoff/SKILL.md and you'll see an explicit instruction to the LLM: "if the binary cannot be executed, do not fabricate a <handoff> block from raw session JSONL." The reasoning is concrete: without the binary's scrub pass, a hand-rolled digest would silently bypass redaction.

Scrubbing is best-effort. It does not catch:

Custom enterprise secret formats.
Secrets broken across lines (IDE copy-paste sometimes wraps).
Anything you wrote in prose ("my password is correct horse battery staple").

For sensitive sessions, my workflow is: pull <id> first, eyeball the digest locally, then push if it's clean. The local render and the remote push produce identical content modulo the scrub markers, so what you see is what gets uploaded.

The interesting edges

A few details that turned out to matter once I started using this daily.

The first is the short-id collision probe. Eight hex chars of a UUIDv4 give you ~4 billion combinations per project-CLI-month bucket, so collisions are rare without being impossible. Before any push, the binary runs a git ls-remote for the target branch. If it exists and the remote metadata.json's session_id matches yours, it's the same session, and the push proceeds as an update. If they don't match, the push refuses with a clear error and points at --force-collision for the override. No silent clobbers.

The second is connectivity caching. Both push and fetch run a connectivity check before each operation, then cache the result for five minutes so you don't pay the round-trip cost on a sequence of related commands. Pass --verify to force a fresh probe.

The third is the choice of git as transport. It's a substrate I already trust, with cheap branches, well-understood ACLs, and prune semantics that map naturally onto branch deletion. There's no new service to operate, no new credentials to rotate, and any private git provider works: GitHub, GitLab, Gitea, self-hosted, or a file:// URL pointing at a USB stick for air-gapped transfer.

What it isn't

A short list of capabilities I deliberately did not build, and why:

Not end-to-end encrypted. Transport is access-controlled by your private repo's ACL; content is plaintext on the remote. If your threat model demands encryption at rest in the transport repo, that's a feature for a future version.
Not fuzzy or semantic search. search is substring/regex only. The corpus is small enough that a smart grep is faster and more predictable than a vector index.
Doesn't invoke the target CLI for you. The skill prints; you paste. That's deliberate. Keeping the human in the transfer loop preserves auditability and avoids automating a step where wrong context is worse than no context.

I should also be honest about the rough edges. While writing this article I exercised the full round-trip end-to-end and surfaced three small bugs in the process: a flag silently dropped on the wrong verb, a misleading prune failure report, and a default-branch trap baked into the auto-bootstrap path. They're tracked publicly as #178, #179, and #180. None of them block daily use; all three are cosmetic or recoverable. The transport itself is solid.

Try it

Install is one npm command. The first push walks you through repo setup:

npm install -g @dotbabel/dotbabel
dotbabel handoff push --from claude

Source, issue tracker, and contribution guide live at github.com/kaiohenricunha/dotbabel. PRs, bug reports, and "this didn't work on my $obscure-shell" notes all welcome. The broader project tour is in the dotbabel governance article.

dotbabel: The Open-Source Governance Layer for AI-Assisted Development

Kaio Cunha — Sat, 18 Apr 2026 22:58:04 +0000

Update (2026): The project has been renamed from dotclaude to dotbabel to reflect its model-agnostic positioning. v1.x setups continue to work via a one-release-window read-fallback compat layer (~/.config/dotclaude/, DOTCLAUDE_* env vars, etc.); compat shims are removed in 3.0.0. Migration guide: https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md

You finish a great Claude Code session. A solid PR-review workflow. A debugging loop that actually finds root causes. A deploy checklist you trust. You close the terminal.

Next week, starting fresh, you've lost all of it. The assistant has no memory of how you like to work. You re-explain the worktree convention. You re-explain the test-plan format. You re-explain why --force-push on main is never OK.

Now scale that problem to a team. Five engineers using Claude Code, each with their own tricks, no shared floor of discipline. PRs land with different review depths. Audits have no structure. Some sessions produce hallucinated "fixes" that never touched the real code path. Specs drift from implementation and nobody notices until something breaks in prod.

dotbabel is an MIT-licensed project for agentic CLIs (Claude Code, Codex, Gemini CLI, Copilot CLI) that solves both problems from the same codebase.

Two problems, one repo

The project has a dual-persona monorepo layout (ADR-0001). That sounds architectural, but it maps to two very different users:

The individual developer who wants a portable skills library wired into every Claude Code session on their laptop.
The engineering team that wants a governance CLI enforcing spec-backed PRs, skill-manifest integrity, and drift detection in CI.

Both paths are backed by the same skills, the same slash commands, the same CLAUDE.md rules. Neither path requires the other. You can use one, both, or swap from one to the other as your needs change.

Path 1: skills & commands in every session

For the individual path, the install is three lines:

git clone https://github.com/kaiohenricunha/dotbabel.git ~/projects/dotbabel
cd ~/projects/dotbabel
./bootstrap.sh

bootstrap.sh symlinks commands/, skills/, and CLAUDE.md into ~/.claude/. From that point, every Claude Code session in every repo has access to the full library. The highlights:

Cloud & IaC specialists — the aws-specialist, gcp-specialist, azure-specialist, kubernetes-specialist, terraform-specialist, terragrunt-specialist, pulumi-specialist, and crossplane-specialist skills auto-trigger when you mention the relevant technology. Saying "review the IAM trust policy on the prod account" is enough.
Slash commands for real PR work — /pre-pr runs a simplify + security-review + full-test-suite gate before you open the PR. /review-pr <N> walks 14 steps: fetch comments, validate each one, apply fixes in an isolated worktree, run the test plan, resolve threads. /review-prs <N1> <N2> ... dispatches one sub-agent per PR in parallel, up to six concurrent, and aggregates results into a table.
Debugging disciplines — /ground-first <subject> forces a read-before-edit pass with file:line citations before any change is proposed. /fix-with-evidence <issue> enforces a Reproduce → Fix → Verify → PR loop.
Analysis docs — /create-audit, /create-inspection, and /create-assessment produce evidence-backed markdown documents in docs/audits/, docs/inspections/, and docs/assessments/ respectively. Every claim cites a file, a line, or command output.
Cross-machine handoff — /handoff push claude latest scrubs secrets and uploads a digest to a private GitHub gist. On another machine: /handoff pull latest. Your Windows/WSL session continues on Linux without re-explaining context.

The CLAUDE.md file installs a global rule floor alongside the skills: no pushing to main without explicit instruction, no force-pushing another session's branch, no --no-verify or --no-gpg-sign, full test suite before merges that touch protected paths, and a spec-coverage contract enforced at PR time.

To stay current: ./sync.sh pull (bootstrap path) or dotbabel sync pull (npm path) re-bootstraps from the latest main. No npm required for the bootstrap path.

Path 2: the governance CLI

For the team path, there's a zero-runtime-dependency npm package:

npm install -g @dotbabel/dotbabel
dotbabel bootstrap

That installs the same skills library but also gives you a set of validators designed for CI:

dotbabel-validate-specs — audits spec contracts, catches dependency cycles.
dotbabel-check-spec-coverage — the PR-time gate. Any PR that touches a protected path (defined in docs/repo-facts.json) must carry a Spec ID: header or a ## No-spec rationale section. No loophole.
dotbabel-check-instruction-drift — detects stale CLAUDE.md and README entries.
dotbabel-detect-drift — flags commands that have diverged from origin/main for 14+ days.
dotbabel-doctor — self-diagnostic across env, facts, manifest, specs, drift, hooks.

Every bin honors --help, --version, --json, --verbose, --no-color. Exit codes follow the {0, 1, 2, 64} convention (ADR-0013), with 64 matching BSD EX_USAGE. Every failure surfaces as a structured ValidationError with a stable .code (ADR-0012), so your CI scripts branch on classes of failure instead of grepping strings.

There's also a Node API for teams that want to build their own gates:

import {
  createHarnessContext,
  validateSpecs,
  ERROR_CODES,
  EXIT_CODES,
} from "@dotbabel/dotbabel";

const ctx = createHarnessContext();
const { ok, errors } = validateSpecs(ctx);
if (!ok) process.exit(EXIT_CODES.VALIDATION);

Need a scaffold for a fresh repo?

npx dotbabel-init --project-name my-project --project-type node

That writes .claude/settings.json, the skills manifest, a destructive-git guard hook, three GitHub Actions workflows (validate-skills, detect-drift, ai-review), and a spec stub. A green dotbabel-doctor from a cold start.

A quick taste

After bootstrap, pick a real repo and try:

# Read before you touch anything.
/ground-first auth token refresh race condition
# → grounded analysis with file:line citations, no edits proposed

# Fix a reported bug with a full evidence loop.
/fix-with-evidence 140
# → reproduces, fixes, verifies, opens a PR — all with proof

# Deep AWS IAM review.
/aws-specialist review IAM policies in the production account
# → structured report: least-privilege gaps, trust-policy findings, remediations

# Batch-triage every open Dependabot PR.
/dependabot-sweep
# → parallel sub-agents annotate risk; safe bumps merged automatically

Every command is context-aware. It reads your repo's files, git history, CI state, and PR body. It cites evidence. It never pushes without permission.

Why bother with governance at all

The case for spec-driven development gets stronger the more AI you put into the loop. An assistant that writes code fast enough to outrun human review is a liability unless the rules of the game are encoded somewhere machine-readable. docs/specs/ becomes the contract. Protected paths become the enforcement surface. A PR gate that says "touched this path → show me the Spec ID" turns AI speed into a feature instead of a foot-gun.

dotbabel isn't opinionated about which workflow you adopt. It's opinionated that some workflow must exist — and that the same tools should serve both the person writing the code and the team shipping it.

Where to go next

README — both install paths, full skills catalog.
docs/quickstart.md — install to first green validator in under 10 minutes.
docs/architecture.md — layer diagram and PR-time sequence.
docs/adr/ — every hardening decision, with rationale.

MIT licensed. Issues and PRs welcome.

Why Istio's Metrics Merging Breaks in Multi-Container Pods (And How to Fix It)

Kaio Cunha — Tue, 17 Mar 2026 00:07:31 +0000

If you run multi-container pods under Istio with STRICT mTLS, you're probably missing metrics

And you might not know it. The containers are healthy. The scrape job shows no errors. But half your metrics are just... absent from Prometheus. No alert, no obvious explanation.

I spent a while debugging this before I understood what was going on, so here's the full picture.

The problem

Istio has a built-in metrics-merging feature that lets Prometheus scrape a pod through the Istio proxy without reaching each container directly. It's useful. But it has a hard limitation that the docs mention only in passing:

Istio's metrics-merge only supports one port per pod.

The Superorbital team wrote the definitive explanation of why this is the case. The short version: Istio's proxy forwards the scrape to a single application port. If you have three containers each exposing /metrics on different ports, Istio picks one and ignores the rest.

Someone opened a feature request for multi-port support back in 2022. It was labeled lifecycle/stale and auto-closed. There are several other issues from people hitting variations of this same problem. None of them were resolved.

Here's what it looks like in practice:

# Pod with api container (:8080) and worker container (:9100)

up{pod="my-app-abc123", container="api"}    = 1   ✓ scraped through Istio proxy

# worker metrics? absent. no error, just gone.

The worker container is perfectly healthy. Its metrics just never reach Prometheus. No scrape failure gets recorded because Prometheus never even tries. It only knows about the one port Istio advertises.

The workarounds you'll try (and why they don't work)

"Just scrape each container port directly." Works if mTLS is in permissive mode. In STRICT mode, every connection must go through the Istio proxy, which only forwards to one port. Direct port scraping gets rejected at the mTLS layer.

"Use multiple PodMonitor entries pointing at different ports." Same problem. The proxy is the bottleneck, not the scrape configuration.

"Push metrics to a Pushgateway." Technically works, but now you've broken the pull model everything else in your stack depends on, added a component that becomes a single point of failure, and introduced staleness semantics that are genuinely confusing to debug.

What about ambient mode?

Before I get to my solution, I should be upfront: if you're running Istio in ambient mode (GA since Istio 1.24), this problem doesn't apply to you. Ambient replaces the per-pod sidecar with a per-node L4 proxy (ztunnel), so there's no sidecar sitting inside your pod intercepting scrapes. Prometheus can reach your container ports directly, and mTLS is handled transparently at the node level. Howard John from the Istio team wrote about this — the TL;DR is "it just works."

But most production Istio deployments are still running sidecar mode. Migrating to ambient is a significant undertaking, and the Istio project itself says they expect many users to stay on sidecars for years. If that's you, keep reading.

What actually works in sidecar mode: one sidecar, one port

The idea is simple. Add a small sidecar container that scrapes all your other containers over localhost (where mTLS doesn't apply, because it's all inside the same pod) and exposes the merged result on a single port. Istio sees one port, Prometheus scrapes one port, and you get everything.

┌──────────────────────────────────────────────────────┐
│  Pod                                                 │
│                                                      │
│  ┌────────┐  localhost:8080/metrics                  │
│  │  api   ├──────────────────┐                       │
│  └────────┘                  │                       │
│                         ┌────▼──────────┐            │
│  ┌────────┐             │  aggregator   │            │
│  │ worker ├────────────►│  :9090/metrics│◄── Prometheus
│  └────────┘             └───────────────┘            │
│             localhost:9100/metrics                   │
└──────────────────────────────────────────────────────┘

This is what metrics-aggregator does. I built it because I kept hitting this problem and none of the existing tools solved it cleanly.

Configuration

Add it as a sidecar to any pod:

containers:
  - name: metrics-aggregator
    image: ghcr.io/kaiohenricunha/metrics-aggregator:latest
    ports:
      - containerPort: 9090
    env:
      - name: METRICS_ENDPOINTS
        # JSON map (recommended), or comma-separated URLs
        value: '{"api":"http://localhost:8080/metrics","worker":"http://localhost:9100/metrics"}'

Point Prometheus at port 9090:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9090"

That's it. No extra service, no push gateway, no changes to your app containers.

Here's what Prometheus sees after:

# Same pod, same containers, all metrics present now

http_requests_total{method="GET", status="200", origin_container="api"}    1027
http_requests_total{method="GET", status="200", origin_container="worker"}  843

go_goroutines{origin_container="api"}    42
go_goroutines{origin_container="worker"} 17

Every metric line gets an origin_container label injected automatically so you can tell which container produced it. # TYPE and # HELP lines are deduplicated so the output is valid Prometheus exposition format.

How it works under the hood

Endpoints are scraped concurrently with best-effort semantics. If one container is down, the others still report. The request only fails if every source fails.

The repo has the full details: self-instrumentation metrics, optional OpenTelemetry tracing, alerting rules, and a Grafana dashboard. I won't rehash all of that here.

Does it actually work under STRICT mTLS?

Yes. The CI suite deploys a 4-container pod (three app containers plus istio-proxy) under PeerAuthentication mode STRICT and asserts that Prometheus sustains up == 1 over 60 seconds. The scrape goes through the proxy; the internal localhost scrapes bypass it entirely.

I wanted this to be tested in CI, not just "it works on my cluster."

Supply chain security

The image is signed with Cosign, scanned with Trivy on every release, and ships with SBOM and SLSA provenance. Releases use semantic versioning via Conventional Commits. This is infrastructure tooling that goes into your production pods, so I wanted to get this part right.

Getting started

Full manifests (plain Deployment, PodMonitor, Helm, Kustomize) are in the examples/ directory.

Quickest path:

kubectl apply -f https://raw.githubusercontent.com/kaiohenricunha/metrics-aggregator/main/examples/deployment.yaml

The repo is here: kaiohenricunha/metrics-aggregator

If you're on sidecar mode with STRICT mTLS and wondering why half your metrics are missing, give it a try. And if you're planning a migration to ambient mode down the road but need something that works today, this bridges the gap. Open an issue if something doesn't work or if you have a use case I haven't thought of.

Update: I wrote a follow-up post exploring the broader question of whether Istio should extend metrics merging or sunset it entirely: Istio's metrics merging was built for a simpler world. What should replace it?