<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kaio Cunha</title>
    <description>The latest articles on Forem by Kaio Cunha (@kaiohenricunha).</description>
    <link>https://forem.com/kaiohenricunha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3828257%2F134a4877-ba9a-4bbc-bf40-35b7ede7f498.jpeg</url>
      <title>Forem: Kaio Cunha</title>
      <link>https://forem.com/kaiohenricunha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kaiohenricunha"/>
    <language>en</language>
    <item>
      <title>dotbabel `/handoff`: portable context across Claude Code, Codex, Copilot CLI, and Gemini CLI</title>
      <dc:creator>Kaio Cunha</dc:creator>
      <pubDate>Wed, 06 May 2026 14:38:27 +0000</pubDate>
      <link>https://forem.com/kaiohenricunha/dotclaude-handoff-portable-context-across-claude-code-codex-copilot-cli-and-gemini-cli-3733</link>
      <guid>https://forem.com/kaiohenricunha/dotclaude-handoff-portable-context-across-claude-code-codex-copilot-cli-and-gemini-cli-3733</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Update (2026): The project has been renamed from dotclaude to dotbabel to reflect its model-agnostic positioning. v1.x setups continue to work via a one-release-window read-fallback compat layer (~/.config/dotclaude/, DOTCLAUDE_* env vars, etc.); compat shims are removed in 3.0.0. Migration guide: &lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md" rel="noopener noreferrer"&gt;https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;One skill from the dotbabel project, and how it solved cross-CLI session transfer.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;It happened on a Tuesday.&lt;/p&gt;

&lt;p&gt;I was four hours into a careful refactor of a GraphQL gateway in Claude Code. The kind of session where you've walked the model through three layers of internal context, agreed on a strategy, and started touching files. The plan was tight, the momentum was real.&lt;/p&gt;

&lt;p&gt;Then I hit the context limit.&lt;/p&gt;

&lt;p&gt;Claude told me to pick it up later with &lt;code&gt;claude --resume &amp;lt;some-uuid&amp;gt;&lt;/code&gt;. Codex was already open in the next tmux pane, idle. I had three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Paste a 14k-character transcript into Codex and cross my fingers.&lt;/li&gt;
&lt;li&gt;Ask Claude to "summarize this for the next agent," then watch it omit the load-bearing details.&lt;/li&gt;
&lt;li&gt;Start over from scratch in Codex and waste the morning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of them were good. So I built &lt;code&gt;/handoff&lt;/code&gt;. It's why I keep dotbabel installed on every machine I work on.&lt;/p&gt;

&lt;h2&gt;
  
  
  A word on dotbabel
&lt;/h2&gt;

&lt;p&gt;If you read &lt;a href="https://medium.com/@methodMan/dotbabel-the-open-source-governance-layer-for-ai-assisted-development-b57880968ce9" rel="noopener noreferrer"&gt;my earlier piece on dotbabel&lt;/a&gt;, you already know the project: an MIT-licensed governance layer for Claude Code, with a portable skills library on one side and a CI-friendly validation CLI on the other. That post covered the architecture and motivation. This one zooms into a single skill from the library.&lt;/p&gt;

&lt;p&gt;The premise: skills travel with you across machines. That's the whole point of dotbabel path 1. Conversations didn't travel with them. You'd open a fresh CLI on a new machine and lose every working assumption from the last session. &lt;code&gt;/handoff&lt;/code&gt; closes that gap with three verbs and a private git repo as transport.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three verbs in sixty seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff pull &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;             &lt;span class="c"&gt;# render a local session as markdown&lt;/span&gt;
dotbabel handoff push &lt;span class="nt"&gt;--from&lt;/span&gt; &amp;lt;cli&amp;gt;     &lt;span class="c"&gt;# ship a session to a private git repo&lt;/span&gt;
dotbabel handoff fetch &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;            &lt;span class="c"&gt;# grab a session from any other machine&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those three verbs are the entire surface. &lt;code&gt;pull&lt;/code&gt; is local-only: it reads a session transcript from disk and emits a &lt;code&gt;&amp;lt;handoff&amp;gt;&lt;/code&gt; block you can paste anywhere. &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;fetch&lt;/code&gt; use a private git repo as transport, so you can move context across machines without standing up new infrastructure.&lt;/p&gt;

&lt;p&gt;One note on &lt;code&gt;push&lt;/code&gt; arguments. &lt;code&gt;--from &amp;lt;cli&amp;gt;&lt;/code&gt; is required only when no &lt;code&gt;&amp;lt;query&amp;gt;&lt;/code&gt; is given, since the tool needs to know whose &lt;code&gt;latest&lt;/code&gt; to ship. With an explicit &lt;code&gt;&amp;lt;query&amp;gt;&lt;/code&gt; (UUID, short UUID, alias, or &lt;code&gt;latest&lt;/code&gt;), &lt;code&gt;--from&lt;/code&gt; is optional and acts as a filter that narrows the resolver to one CLI's sessions.&lt;/p&gt;

&lt;p&gt;I'll call the rendered output &lt;strong&gt;the digest&lt;/strong&gt; for the rest of this article. It's the thing all three verbs operate on.&lt;/p&gt;

&lt;h3&gt;
  
  
  A small note on invocation
&lt;/h3&gt;

&lt;p&gt;Claude Code is the primary host for dotbabel skills. It autoloads &lt;code&gt;~/.claude/skills/&lt;/code&gt;, so &lt;code&gt;/handoff&lt;/code&gt; is available as a native slash command from the moment the binary is installed. Inside Claude Code I just type &lt;code&gt;/handoff push --from claude&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Other CLIs aren't there yet. Codex, Copilot, and Gemini don't autoload the skill manifest, so you call the underlying binary directly via the CLI's bash escape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;dotbabel handoff push &lt;span class="nt"&gt;--from&lt;/span&gt; gemini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same code path, same behavior, slightly more typing. Native slash-command support for Codex, Copilot, and Gemini is on the roadmap; for now, the &lt;code&gt;!&lt;/code&gt; prefix is the contract. For brevity, the rest of this article uses the bare &lt;code&gt;dotbabel handoff …&lt;/code&gt; form. Prepend &lt;code&gt;!&lt;/code&gt; if you're calling from inside a non-Claude CLI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @dotbabel/dotbabel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That covers the local-only path. &lt;code&gt;pull&lt;/code&gt; works the moment the binary is installed: no network, no auth, no config.&lt;/p&gt;

&lt;p&gt;For cross-machine work, you need a private git repo and one environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DOTBABEL_HANDOFF_REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;git@github.com:you/handoff-store.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or skip the manual setup. The first time you run &lt;code&gt;push&lt;/code&gt;, dotbabel detects an unset &lt;code&gt;DOTBABEL_HANDOFF_REPO&lt;/code&gt;, checks whether &lt;code&gt;gh&lt;/code&gt; is authenticated, offers to create a private repo for you, and persists the URL to &lt;code&gt;~/.config/dotbabel/handoff.env&lt;/code&gt;. The whole bootstrap is one yes-or-no prompt.&lt;/p&gt;

&lt;p&gt;Verify your setup any time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're looking for &lt;code&gt;ok&lt;/code&gt; and a non-empty &lt;code&gt;DOTBABEL_HANDOFF_REPO&lt;/code&gt;. Anything else, the doctor prints a structured remediation block telling you exactly what's wrong and how to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Walkthrough: local handoff
&lt;/h2&gt;

&lt;p&gt;Say I want to move my current Claude Code session into Codex. No transport repo needed for that case: same machine, same filesystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff pull latest &lt;span class="nt"&gt;--from&lt;/span&gt; claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This finds my most recent Claude session, extracts the user prompts and the last few assistant turns, and prints a digest to stdout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;handoff&lt;/span&gt; &lt;span class="na"&gt;origin=&lt;/span&gt;&lt;span class="s"&gt;"claude"&lt;/span&gt; &lt;span class="na"&gt;session=&lt;/span&gt;&lt;span class="s"&gt;"a1b2c3d4"&lt;/span&gt; &lt;span class="na"&gt;cwd=&lt;/span&gt;&lt;span class="s"&gt;"/home/dev/projects/gateway"&lt;/span&gt; &lt;span class="na"&gt;target=&lt;/span&gt;&lt;span class="s"&gt;"claude"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="gs"&gt;**Summary.**&lt;/span&gt; Session opened with: "/refactor the resolver layer to use dataloaders".
Last assistant output (truncated): "Approved. Applying the changes to resolvers/user.ts".
Full prompt log and assistant tail follow for context.

&lt;span class="gs"&gt;**User prompts (last 10, in order).**&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; /refactor the resolver layer to use dataloaders
&lt;span class="p"&gt;2.&lt;/span&gt; Show me the existing resolver shape first
&lt;span class="p"&gt;3.&lt;/span&gt; Why are we batching by tenant_id and not user_id?
…

&lt;span class="gs"&gt;**Last assistant turns (tail).**&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; The current resolver hits the DB once per request. Batching by tenant…&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Plan: introduce a DataLoader keyed on (tenant_id, user_id) and migrate…&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Approved. Applying the changes to resolvers/user.ts&lt;/span&gt;

&lt;span class="gs"&gt;**Next step.**&lt;/span&gt; Continue from the last assistant turn using the same file scope and goals summarized above.

&lt;span class="nt"&gt;&amp;lt;/handoff&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;lt;handoff&amp;gt;&lt;/code&gt; tag is deliberate: it's a machine-readable marker that lets a receiving agent detect the digest and treat it as a task specification with explicit scope.&lt;/p&gt;

&lt;p&gt;Three variants worth knowing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same digest, but written to a markdown file under docs/handoffs/&lt;/span&gt;
dotbabel handoff pull &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; auto

&lt;span class="c"&gt;# A terser prose summary, useful when you just want to remember what a session was about&lt;/span&gt;
dotbabel handoff pull &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--summary&lt;/span&gt;

&lt;span class="c"&gt;# Specific output path&lt;/span&gt;
dotbabel handoff pull &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /tmp/handoff.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resolving an ID
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;id&amp;gt;&lt;/code&gt; accepts more than UUIDs. The resolver tries, in order: full UUID → short UUID (the first 8 hex chars) → the literal &lt;code&gt;latest&lt;/code&gt; → a deliberate-label alias. Aliases are case-insensitive and come from whatever the source CLI calls them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude's &lt;code&gt;customTitle&lt;/code&gt; or &lt;code&gt;aiTitle&lt;/code&gt; (set with &lt;code&gt;claude --resume "my-feature"&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Codex's &lt;code&gt;thread_name&lt;/code&gt; (set with &lt;code&gt;codex resume &amp;lt;name&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Copilot's &lt;code&gt;workspace.yaml:name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Gemini's &lt;code&gt;checkpoint-&amp;lt;tag&amp;gt;.json&lt;/code&gt; (set with &lt;code&gt;/chat save &amp;lt;tag&amp;gt;&lt;/code&gt; inside the session).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aliases are why the workflow stays out of my way. I rename my Claude session &lt;code&gt;gateway-refactor&lt;/code&gt;, walk to my desktop the next morning, run &lt;code&gt;dotbabel handoff fetch gateway-refactor&lt;/code&gt;, and it works. No UUID copy-paste, no scrolling through directory listings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Walkthrough: across machines
&lt;/h2&gt;

&lt;p&gt;Now the scenario that motivated the whole skill: moving a session from my laptop to my desktop.&lt;/p&gt;

&lt;p&gt;On the laptop, before I close my coffee shop tab:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff push &lt;span class="nt"&gt;--from&lt;/span&gt; claude &lt;span class="nt"&gt;--tag&lt;/span&gt; end-of-day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is short and useful:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;handoff/gateway/claude/2026-05/a1b2c3d4
git@github.com:you/handoff-store.git
handoff:v2:gateway:claude:2026-05:a1b2c3d4:laptop-mbp:end-of-day
[scrubbed 0 secrets]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first line is the canonical branch name. The shape is intentional:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;handoff/&amp;lt;project&amp;gt;/&amp;lt;cli&amp;gt;/&amp;lt;YYYY-MM&amp;gt;/&amp;lt;short-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's namespaced by project (derived from the session's git root), then origin CLI, then year-month, then the 8-hex short id. The structure does double duty as a collision domain. Two sessions in the same project, same CLI, same month, with the same short-id prefix would clash, and the binary's collision probe catches that before any push lands.&lt;/p&gt;

&lt;p&gt;On the desktop, an hour later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff fetch a1b2c3d4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;fetch&lt;/code&gt; clones just the one branch, reads &lt;code&gt;handoff.md&lt;/code&gt; from its tip, and prints it to stdout. Same digest I produced on the laptop. I paste it into a fresh Claude, Codex, or Gemini session, and the new agent picks up where the old one left off, with the file scope and plan intact.&lt;/p&gt;

&lt;p&gt;You can list and search before fetching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotbabel handoff list &lt;span class="nt"&gt;--remote&lt;/span&gt; &lt;span class="nt"&gt;--limit&lt;/span&gt; 10
dotbabel handoff search &lt;span class="s2"&gt;"dataloader"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The list view shows location, CLI, short id, and timestamp. Search runs a substring/regex match against the digest content. Lucene this is not, but it's good enough to find a specific session by a phrase you remember from the prompt log.&lt;/p&gt;

&lt;h2&gt;
  
  
  The redaction pass
&lt;/h2&gt;

&lt;p&gt;Here's the part I wasn't willing to hand-wave. The digest is plaintext markdown going to a remote git repo. If I accidentally pasted an API key into a session three days ago, that key is in the transcript. If &lt;code&gt;push&lt;/code&gt; doesn't strip it, my secrets manager just got bypassed by a developer-experience tool.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;push&lt;/code&gt; runs the digest through a redaction script before it ever leaves the machine. The script operates on stdin, applies eight regex passes, and emits the redacted text plus a &lt;code&gt;scrubbed:&amp;lt;N&amp;gt;&lt;/code&gt; count on stderr. Eight things go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub tokens (&lt;code&gt;ghp_…&lt;/code&gt;, &lt;code&gt;gho_…&lt;/code&gt;, &lt;code&gt;ghs_…&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;OpenAI / Anthropic-style keys (&lt;code&gt;sk-…&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;AWS access keys (&lt;code&gt;AKIA…&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Google API keys (&lt;code&gt;AIza…&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Slack tokens (&lt;code&gt;xox[baprs]-…&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;HTTP &lt;code&gt;Authorization: Bearer …&lt;/code&gt; headers&lt;/li&gt;
&lt;li&gt;Environment variable assignments matching &lt;code&gt;*TOKEN&lt;/code&gt;, &lt;code&gt;*KEY&lt;/code&gt;, &lt;code&gt;*SECRET&lt;/code&gt;, &lt;code&gt;*PASSWORD&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;PEM private-key block headers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design constraint is "fail closed." If the script can't run for any reason (missing perl, I/O error, malformed input), the push aborts with an error and nothing reaches the remote. There's no &lt;code&gt;--skip-scrub&lt;/code&gt; flag. There never will be.&lt;/p&gt;

&lt;p&gt;The skill itself reinforces this. Look at &lt;code&gt;skills/handoff/SKILL.md&lt;/code&gt; and you'll see an explicit instruction to the LLM: "if the binary cannot be executed, do not fabricate a &lt;code&gt;&amp;lt;handoff&amp;gt;&lt;/code&gt; block from raw session JSONL." The reasoning is concrete: without the binary's scrub pass, a hand-rolled digest would silently bypass redaction.&lt;/p&gt;

&lt;p&gt;Scrubbing is best-effort. It does not catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom enterprise secret formats.&lt;/li&gt;
&lt;li&gt;Secrets broken across lines (IDE copy-paste sometimes wraps).&lt;/li&gt;
&lt;li&gt;Anything you wrote in prose ("my password is correct horse battery staple").&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For sensitive sessions, my workflow is: &lt;code&gt;pull &amp;lt;id&amp;gt;&lt;/code&gt; first, eyeball the digest locally, then &lt;code&gt;push&lt;/code&gt; if it's clean. The local render and the remote push produce identical content modulo the scrub markers, so what you see is what gets uploaded.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interesting edges
&lt;/h2&gt;

&lt;p&gt;A few details that turned out to matter once I started using this daily.&lt;/p&gt;

&lt;p&gt;The first is the short-id collision probe. Eight hex chars of a UUIDv4 give you ~4 billion combinations per project-CLI-month bucket, so collisions are rare without being impossible. Before any push, the binary runs a &lt;code&gt;git ls-remote&lt;/code&gt; for the target branch. If it exists and the remote &lt;code&gt;metadata.json&lt;/code&gt;'s &lt;code&gt;session_id&lt;/code&gt; matches yours, it's the same session, and the push proceeds as an update. If they don't match, the push refuses with a clear error and points at &lt;code&gt;--force-collision&lt;/code&gt; for the override. No silent clobbers.&lt;/p&gt;

&lt;p&gt;The second is connectivity caching. Both &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;fetch&lt;/code&gt; run a connectivity check before each operation, then cache the result for five minutes so you don't pay the round-trip cost on a sequence of related commands. Pass &lt;code&gt;--verify&lt;/code&gt; to force a fresh probe.&lt;/p&gt;

&lt;p&gt;The third is the choice of git as transport. It's a substrate I already trust, with cheap branches, well-understood ACLs, and prune semantics that map naturally onto branch deletion. There's no new service to operate, no new credentials to rotate, and any private git provider works: GitHub, GitLab, Gitea, self-hosted, or a &lt;code&gt;file://&lt;/code&gt; URL pointing at a USB stick for air-gapped transfer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it isn't
&lt;/h2&gt;

&lt;p&gt;A short list of capabilities I deliberately did not build, and why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not end-to-end encrypted.&lt;/strong&gt; Transport is access-controlled by your private repo's ACL; content is plaintext on the remote. If your threat model demands encryption at rest in the transport repo, that's a feature for a future version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not fuzzy or semantic search.&lt;/strong&gt; &lt;code&gt;search&lt;/code&gt; is substring/regex only. The corpus is small enough that a smart &lt;code&gt;grep&lt;/code&gt; is faster and more predictable than a vector index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Doesn't invoke the target CLI for you.&lt;/strong&gt; The skill prints; you paste. That's deliberate. Keeping the human in the transfer loop preserves auditability and avoids automating a step where wrong context is worse than no context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I should also be honest about the rough edges. While writing this article I exercised the full round-trip end-to-end and surfaced three small bugs in the process: a flag silently dropped on the wrong verb, a misleading &lt;code&gt;prune&lt;/code&gt; failure report, and a default-branch trap baked into the auto-bootstrap path. They're tracked publicly as &lt;a href="https://github.com/kaiohenricunha/dotbabel/issues/178" rel="noopener noreferrer"&gt;#178&lt;/a&gt;, &lt;a href="https://github.com/kaiohenricunha/dotbabel/issues/179" rel="noopener noreferrer"&gt;#179&lt;/a&gt;, and &lt;a href="https://github.com/kaiohenricunha/dotbabel/issues/180" rel="noopener noreferrer"&gt;#180&lt;/a&gt;. None of them block daily use; all three are cosmetic or recoverable. The transport itself is solid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Install is one npm command. The first push walks you through repo setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @dotbabel/dotbabel
dotbabel handoff push &lt;span class="nt"&gt;--from&lt;/span&gt; claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source, issue tracker, and contribution guide live at &lt;a href="https://github.com/kaiohenricunha/dotbabel" rel="noopener noreferrer"&gt;github.com/kaiohenricunha/dotbabel&lt;/a&gt;. PRs, bug reports, and "this didn't work on my $obscure-shell" notes all welcome. The broader project tour is in &lt;a href="https://medium.com/@methodMan/dotbabel-the-open-source-governance-layer-for-ai-assisted-development-b57880968ce9" rel="noopener noreferrer"&gt;the dotbabel governance article&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>gemini</category>
      <category>githubcopilot</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>dotbabel: The Open-Source Governance Layer for AI-Assisted Development</title>
      <dc:creator>Kaio Cunha</dc:creator>
      <pubDate>Sat, 18 Apr 2026 22:58:04 +0000</pubDate>
      <link>https://forem.com/kaiohenricunha/dotclaude-the-open-source-governance-layer-for-ai-assisted-development-3177</link>
      <guid>https://forem.com/kaiohenricunha/dotclaude-the-open-source-governance-layer-for-ai-assisted-development-3177</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Update (2026): The project has been renamed from dotclaude to dotbabel to reflect its model-agnostic positioning. v1.x setups continue to work via a one-release-window read-fallback compat layer (~/.config/dotclaude/, DOTCLAUDE_* env vars, etc.); compat shims are removed in 3.0.0. Migration guide: &lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md" rel="noopener noreferrer"&gt;https://github.com/kaiohenricunha/dotbabel/blob/main/docs/upgrade-guide.md&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You finish a great Claude Code session. A solid PR-review workflow. A debugging loop that actually finds root causes. A deploy checklist you trust. You close the terminal.&lt;/p&gt;

&lt;p&gt;Next week, starting fresh, you've lost all of it. The assistant has no memory of how &lt;em&gt;you&lt;/em&gt; like to work. You re-explain the worktree convention. You re-explain the test-plan format. You re-explain why &lt;code&gt;--force-push&lt;/code&gt; on &lt;code&gt;main&lt;/code&gt; is never OK.&lt;/p&gt;

&lt;p&gt;Now scale that problem to a team. Five engineers using Claude Code, each with their own tricks, no shared floor of discipline. PRs land with different review depths. Audits have no structure. Some sessions produce hallucinated "fixes" that never touched the real code path. Specs drift from implementation and nobody notices until something breaks in prod.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kaiohenricunha/dotbabel" rel="noopener noreferrer"&gt;dotbabel&lt;/a&gt; is an MIT-licensed project for agentic CLIs (Claude Code, Codex, Gemini CLI, Copilot CLI) that solves both problems from the same codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two problems, one repo
&lt;/h2&gt;

&lt;p&gt;The project has a &lt;strong&gt;dual-persona monorepo&lt;/strong&gt; layout (&lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/adr/0001-monorepo-dual-persona-layout.md" rel="noopener noreferrer"&gt;ADR-0001&lt;/a&gt;). That sounds architectural, but it maps to two very different users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The individual developer&lt;/strong&gt; who wants a portable skills library wired into every Claude Code session on their laptop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The engineering team&lt;/strong&gt; that wants a governance CLI enforcing spec-backed PRs, skill-manifest integrity, and drift detection in CI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both paths are backed by the same skills, the same slash commands, the same &lt;code&gt;CLAUDE.md&lt;/code&gt; rules. Neither path requires the other. You can use one, both, or swap from one to the other as your needs change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 1: skills &amp;amp; commands in every session
&lt;/h2&gt;

&lt;p&gt;For the individual path, the install is three lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kaiohenricunha/dotbabel.git ~/projects/dotbabel
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/projects/dotbabel
./bootstrap.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;bootstrap.sh&lt;/code&gt; symlinks &lt;code&gt;commands/&lt;/code&gt;, &lt;code&gt;skills/&lt;/code&gt;, and &lt;code&gt;CLAUDE.md&lt;/code&gt; into &lt;code&gt;~/.claude/&lt;/code&gt;. From that point, every Claude Code session in every repo has access to the full library. The highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud &amp;amp; IaC specialists&lt;/strong&gt; — the &lt;code&gt;aws-specialist&lt;/code&gt;, &lt;code&gt;gcp-specialist&lt;/code&gt;, &lt;code&gt;azure-specialist&lt;/code&gt;, &lt;code&gt;kubernetes-specialist&lt;/code&gt;, &lt;code&gt;terraform-specialist&lt;/code&gt;, &lt;code&gt;terragrunt-specialist&lt;/code&gt;, &lt;code&gt;pulumi-specialist&lt;/code&gt;, and &lt;code&gt;crossplane-specialist&lt;/code&gt; skills auto-trigger when you mention the relevant technology. Saying "review the IAM trust policy on the prod account" is enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slash commands for real PR work&lt;/strong&gt; — &lt;code&gt;/pre-pr&lt;/code&gt; runs a simplify + security-review + full-test-suite gate before you open the PR. &lt;code&gt;/review-pr &amp;lt;N&amp;gt;&lt;/code&gt; walks 14 steps: fetch comments, validate each one, apply fixes in an isolated worktree, run the test plan, resolve threads. &lt;code&gt;/review-prs &amp;lt;N1&amp;gt; &amp;lt;N2&amp;gt; ...&lt;/code&gt; dispatches one sub-agent per PR in parallel, up to six concurrent, and aggregates results into a table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debugging disciplines&lt;/strong&gt; — &lt;code&gt;/ground-first &amp;lt;subject&amp;gt;&lt;/code&gt; forces a read-before-edit pass with &lt;code&gt;file:line&lt;/code&gt; citations before any change is proposed. &lt;code&gt;/fix-with-evidence &amp;lt;issue&amp;gt;&lt;/code&gt; enforces a Reproduce → Fix → Verify → PR loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis docs&lt;/strong&gt; — &lt;code&gt;/create-audit&lt;/code&gt;, &lt;code&gt;/create-inspection&lt;/code&gt;, and &lt;code&gt;/create-assessment&lt;/code&gt; produce evidence-backed markdown documents in &lt;code&gt;docs/audits/&lt;/code&gt;, &lt;code&gt;docs/inspections/&lt;/code&gt;, and &lt;code&gt;docs/assessments/&lt;/code&gt; respectively. Every claim cites a file, a line, or command output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-machine handoff&lt;/strong&gt; — &lt;code&gt;/handoff push claude latest&lt;/code&gt; scrubs secrets and uploads a digest to a private GitHub gist. On another machine: &lt;code&gt;/handoff pull latest&lt;/code&gt;. Your Windows/WSL session continues on Linux without re-explaining context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;CLAUDE.md&lt;/code&gt; file installs a global rule floor alongside the skills: no pushing to &lt;code&gt;main&lt;/code&gt; without explicit instruction, no force-pushing another session's branch, no &lt;code&gt;--no-verify&lt;/code&gt; or &lt;code&gt;--no-gpg-sign&lt;/code&gt;, full test suite before merges that touch protected paths, and a spec-coverage contract enforced at PR time.&lt;/p&gt;

&lt;p&gt;To stay current: &lt;code&gt;./sync.sh pull&lt;/code&gt; (bootstrap path) or &lt;code&gt;dotbabel sync pull&lt;/code&gt; (npm path) re-bootstraps from the latest &lt;code&gt;main&lt;/code&gt;. No npm required for the bootstrap path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 2: the governance CLI
&lt;/h2&gt;

&lt;p&gt;For the team path, there's a zero-runtime-dependency npm package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @dotbabel/dotbabel
dotbabel bootstrap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That installs the same skills library but also gives you a set of validators designed for CI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dotbabel-validate-specs&lt;/code&gt; — audits spec contracts, catches dependency cycles.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dotbabel-check-spec-coverage&lt;/code&gt; — the PR-time gate. Any PR that touches a protected path (defined in &lt;code&gt;docs/repo-facts.json&lt;/code&gt;) must carry a &lt;code&gt;Spec ID:&lt;/code&gt; header or a &lt;code&gt;## No-spec rationale&lt;/code&gt; section. No loophole.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dotbabel-check-instruction-drift&lt;/code&gt; — detects stale &lt;code&gt;CLAUDE.md&lt;/code&gt; and README entries.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dotbabel-detect-drift&lt;/code&gt; — flags commands that have diverged from &lt;code&gt;origin/main&lt;/code&gt; for 14+ days.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dotbabel-doctor&lt;/code&gt; — self-diagnostic across env, facts, manifest, specs, drift, hooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every bin honors &lt;code&gt;--help&lt;/code&gt;, &lt;code&gt;--version&lt;/code&gt;, &lt;code&gt;--json&lt;/code&gt;, &lt;code&gt;--verbose&lt;/code&gt;, &lt;code&gt;--no-color&lt;/code&gt;. Exit codes follow the &lt;code&gt;{0, 1, 2, 64}&lt;/code&gt; convention (&lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/adr/0013-exit-code-convention.md" rel="noopener noreferrer"&gt;ADR-0013&lt;/a&gt;), with 64 matching BSD &lt;code&gt;EX_USAGE&lt;/code&gt;. Every failure surfaces as a structured &lt;code&gt;ValidationError&lt;/code&gt; with a stable &lt;code&gt;.code&lt;/code&gt; (&lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/adr/0012-structured-error-contract.md" rel="noopener noreferrer"&gt;ADR-0012&lt;/a&gt;), so your CI scripts branch on classes of failure instead of grepping strings.&lt;/p&gt;

&lt;p&gt;There's also a Node API for teams that want to build their own gates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;createHarnessContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;validateSpecs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;ERROR_CODES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;EXIT_CODES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@dotbabel/dotbabel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createHarnessContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;validateSpecs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;EXIT_CODES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;VALIDATION&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Need a scaffold for a fresh repo?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx dotbabel-init &lt;span class="nt"&gt;--project-name&lt;/span&gt; my-project &lt;span class="nt"&gt;--project-type&lt;/span&gt; node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That writes &lt;code&gt;.claude/settings.json&lt;/code&gt;, the skills manifest, a destructive-git guard hook, three GitHub Actions workflows (&lt;code&gt;validate-skills&lt;/code&gt;, &lt;code&gt;detect-drift&lt;/code&gt;, &lt;code&gt;ai-review&lt;/code&gt;), and a spec stub. A green &lt;code&gt;dotbabel-doctor&lt;/code&gt; from a cold start.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick taste
&lt;/h2&gt;

&lt;p&gt;After bootstrap, pick a real repo and try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Read before you touch anything.&lt;/span&gt;
/ground-first auth token refresh race condition
&lt;span class="gh"&gt;# → grounded analysis with file:line citations, no edits proposed&lt;/span&gt;

&lt;span class="gh"&gt;# Fix a reported bug with a full evidence loop.&lt;/span&gt;
/fix-with-evidence 140
&lt;span class="gh"&gt;# → reproduces, fixes, verifies, opens a PR — all with proof&lt;/span&gt;

&lt;span class="gh"&gt;# Deep AWS IAM review.&lt;/span&gt;
/aws-specialist review IAM policies in the production account
&lt;span class="gh"&gt;# → structured report: least-privilege gaps, trust-policy findings, remediations&lt;/span&gt;

&lt;span class="gh"&gt;# Batch-triage every open Dependabot PR.&lt;/span&gt;
/dependabot-sweep
&lt;span class="gh"&gt;# → parallel sub-agents annotate risk; safe bumps merged automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every command is context-aware. It reads your repo's files, git history, CI state, and PR body. It cites evidence. It never pushes without permission.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother with governance at all
&lt;/h2&gt;

&lt;p&gt;The case for spec-driven development gets stronger the more AI you put into the loop. An assistant that writes code fast enough to outrun human review is a liability unless the &lt;em&gt;rules of the game&lt;/em&gt; are encoded somewhere machine-readable. &lt;code&gt;docs/specs/&lt;/code&gt; becomes the contract. Protected paths become the enforcement surface. A PR gate that says "touched this path → show me the Spec ID" turns AI speed into a feature instead of a foot-gun.&lt;/p&gt;

&lt;p&gt;dotbabel isn't opinionated about &lt;em&gt;which&lt;/em&gt; workflow you adopt. It's opinionated that &lt;em&gt;some&lt;/em&gt; workflow must exist — and that the same tools should serve both the person writing the code and the team shipping it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/kaiohenricunha/dotbabel" rel="noopener noreferrer"&gt;README&lt;/a&gt; — both install paths, full skills catalog.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/quickstart.md" rel="noopener noreferrer"&gt;docs/quickstart.md&lt;/a&gt; — install to first green validator in under 10 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kaiohenricunha/dotbabel/blob/main/docs/architecture.md" rel="noopener noreferrer"&gt;docs/architecture.md&lt;/a&gt; — layer diagram and PR-time sequence.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kaiohenricunha/dotbabel/tree/main/docs/adr" rel="noopener noreferrer"&gt;docs/adr/&lt;/a&gt; — every hardening decision, with rationale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT licensed. Issues and PRs welcome.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>opensource</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why Istio's Metrics Merging Breaks in Multi-Container Pods (And How to Fix It)</title>
      <dc:creator>Kaio Cunha</dc:creator>
      <pubDate>Tue, 17 Mar 2026 00:07:31 +0000</pubDate>
      <link>https://forem.com/kaiohenricunha/why-istios-metrics-merging-breaks-in-multi-container-pods-and-how-to-fix-it-3l6f</link>
      <guid>https://forem.com/kaiohenricunha/why-istios-metrics-merging-breaks-in-multi-container-pods-and-how-to-fix-it-3l6f</guid>
      <description>&lt;h2&gt;
  
  
  If you run multi-container pods under Istio with STRICT mTLS, you're probably missing metrics
&lt;/h2&gt;

&lt;p&gt;And you might not know it. The containers are healthy. The scrape job shows no errors. But half your metrics are just... absent from Prometheus. No alert, no obvious explanation.&lt;/p&gt;

&lt;p&gt;I spent a while debugging this before I understood what was going on, so here's the full picture.&lt;/p&gt;




&lt;h3&gt;
  
  
  The problem
&lt;/h3&gt;

&lt;p&gt;Istio has a built-in metrics-merging feature that lets Prometheus scrape a pod through the Istio proxy without reaching each container directly. It's useful. But it has a hard limitation that the docs mention only in passing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Istio's metrics-merge only supports one port per pod.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Superorbital team wrote &lt;a href="https://superorbital.io/blog/istio-metrics-merging/" rel="noopener noreferrer"&gt;the definitive explanation&lt;/a&gt; of why this is the case. The short version: Istio's proxy forwards the scrape to a single application port. If you have three containers each exposing &lt;code&gt;/metrics&lt;/code&gt; on different ports, Istio picks one and ignores the rest.&lt;/p&gt;

&lt;p&gt;Someone &lt;a href="https://github.com/istio/istio/issues/41276" rel="noopener noreferrer"&gt;opened a feature request&lt;/a&gt; for multi-port support back in 2022. It was labeled &lt;code&gt;lifecycle/stale&lt;/code&gt; and auto-closed. There are &lt;a href="https://github.com/istio/istio/issues/27328" rel="noopener noreferrer"&gt;several&lt;/a&gt; &lt;a href="https://github.com/istio/istio/issues/38348" rel="noopener noreferrer"&gt;other&lt;/a&gt; &lt;a href="https://github.com/istio/istio/issues/53753" rel="noopener noreferrer"&gt;issues&lt;/a&gt; from people hitting variations of this same problem. None of them were resolved.&lt;/p&gt;

&lt;p&gt;Here's what it looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pod with api container (:8080) and worker container (:9100)&lt;/span&gt;

&lt;span class="n"&gt;up&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;pod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"my-app-abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;   &lt;span class="err"&gt;✓&lt;/span&gt; &lt;span class="n"&gt;scraped&lt;/span&gt; &lt;span class="n"&gt;through&lt;/span&gt; &lt;span class="n"&gt;Istio&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;

&lt;span class="c"&gt;# worker metrics? absent. no error, just gone.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The worker container is perfectly healthy. Its metrics just never reach Prometheus. No scrape failure gets recorded because Prometheus never even tries. It only knows about the one port Istio advertises.&lt;/p&gt;




&lt;h3&gt;
  
  
  The workarounds you'll try (and why they don't work)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;"Just scrape each container port directly."&lt;/strong&gt; Works if mTLS is in permissive mode. In &lt;code&gt;STRICT&lt;/code&gt; mode, every connection must go through the Istio proxy, which only forwards to one port. Direct port scraping gets rejected at the mTLS layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use multiple &lt;code&gt;PodMonitor&lt;/code&gt; entries pointing at different ports."&lt;/strong&gt; Same problem. The proxy is the bottleneck, not the scrape configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Push metrics to a Pushgateway."&lt;/strong&gt; Technically works, but now you've broken the pull model everything else in your stack depends on, added a component that becomes a single point of failure, and introduced staleness semantics that are genuinely confusing to debug.&lt;/p&gt;




&lt;h3&gt;
  
  
  What about ambient mode?
&lt;/h3&gt;

&lt;p&gt;Before I get to my solution, I should be upfront: if you're running Istio in &lt;strong&gt;ambient mode&lt;/strong&gt; (GA since Istio 1.24), this problem doesn't apply to you. Ambient replaces the per-pod sidecar with a per-node L4 proxy (ztunnel), so there's no sidecar sitting inside your pod intercepting scrapes. Prometheus can reach your container ports directly, and mTLS is handled transparently at the node level. Howard John from the Istio team &lt;a href="https://blog.howardjohn.info/posts/securing-prometheus/" rel="noopener noreferrer"&gt;wrote about this&lt;/a&gt; — the TL;DR is "it just works."&lt;/p&gt;

&lt;p&gt;But most production Istio deployments are still running sidecar mode. Migrating to ambient is a significant undertaking, and the Istio project itself says they expect many users to stay on sidecars for years. If that's you, keep reading.&lt;/p&gt;




&lt;h3&gt;
  
  
  What actually works in sidecar mode: one sidecar, one port
&lt;/h3&gt;

&lt;p&gt;The idea is simple. Add a small sidecar container that scrapes all your other containers over &lt;code&gt;localhost&lt;/code&gt; (where mTLS doesn't apply, because it's all inside the same pod) and exposes the merged result on a single port. Istio sees one port, Prometheus scrapes one port, and you get everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────┐
│  Pod                                                 │
│                                                      │
│  ┌────────┐  localhost:8080/metrics                  │
│  │  api   ├──────────────────┐                       │
│  └────────┘                  │                       │
│                         ┌────▼──────────┐            │
│  ┌────────┐             │  aggregator   │            │
│  │ worker ├────────────►│  :9090/metrics│◄── Prometheus
│  └────────┘             └───────────────┘            │
│             localhost:9100/metrics                   │
└──────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what &lt;a href="https://github.com/kaiohenricunha/metrics-aggregator" rel="noopener noreferrer"&gt;metrics-aggregator&lt;/a&gt; does. I built it because I kept hitting this problem and none of the existing tools solved it cleanly.&lt;/p&gt;




&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Add it as a sidecar to any pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics-aggregator&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/kaiohenricunha/metrics-aggregator:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9090&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;METRICS_ENDPOINTS&lt;/span&gt;
        &lt;span class="c1"&gt;# JSON map (recommended), or comma-separated URLs&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{"api":"http://localhost:8080/metrics","worker":"http://localhost:9100/metrics"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point Prometheus at port 9090:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;prometheus.io/scrape&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;prometheus.io/port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9090"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No extra service, no push gateway, no changes to your app containers.&lt;/p&gt;

&lt;p&gt;Here's what Prometheus sees after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same pod, same containers, all metrics present now&lt;/span&gt;

&lt;span class="n"&gt;http_requests_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"200"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;origin_container&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;    &lt;span class="mi"&gt;1027&lt;/span&gt;
&lt;span class="n"&gt;http_requests_total&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"200"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;origin_container&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"worker"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="mi"&gt;843&lt;/span&gt;

&lt;span class="n"&gt;go_goroutines&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;origin_container&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;    &lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="n"&gt;go_goroutines&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;origin_container&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"worker"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every metric line gets an &lt;code&gt;origin_container&lt;/code&gt; label injected automatically so you can tell which container produced it. &lt;code&gt;# TYPE&lt;/code&gt; and &lt;code&gt;# HELP&lt;/code&gt; lines are deduplicated so the output is valid Prometheus exposition format.&lt;/p&gt;




&lt;h3&gt;
  
  
  How it works under the hood
&lt;/h3&gt;

&lt;p&gt;Endpoints are scraped concurrently with best-effort semantics. If one container is down, the others still report. The request only fails if every source fails.&lt;/p&gt;

&lt;p&gt;The repo has the full details: self-instrumentation metrics, optional OpenTelemetry tracing, alerting rules, and a Grafana dashboard. I won't rehash all of that here.&lt;/p&gt;




&lt;h3&gt;
  
  
  Does it actually work under STRICT mTLS?
&lt;/h3&gt;

&lt;p&gt;Yes. The CI suite deploys a 4-container pod (three app containers plus &lt;code&gt;istio-proxy&lt;/code&gt;) under &lt;code&gt;PeerAuthentication&lt;/code&gt; mode &lt;code&gt;STRICT&lt;/code&gt; and asserts that Prometheus sustains &lt;code&gt;up == 1&lt;/code&gt; over 60 seconds. The scrape goes through the proxy; the internal localhost scrapes bypass it entirely.&lt;/p&gt;

&lt;p&gt;I wanted this to be tested in CI, not just "it works on my cluster."&lt;/p&gt;




&lt;h3&gt;
  
  
  Supply chain security
&lt;/h3&gt;

&lt;p&gt;The image is signed with Cosign, scanned with Trivy on every release, and ships with SBOM and SLSA provenance. Releases use semantic versioning via Conventional Commits. This is infrastructure tooling that goes into your production pods, so I wanted to get this part right.&lt;/p&gt;




&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;Full manifests (plain Deployment, PodMonitor, Helm, Kustomize) are in the &lt;a href="https://github.com/kaiohenricunha/metrics-aggregator/tree/main/examples" rel="noopener noreferrer"&gt;&lt;code&gt;examples/&lt;/code&gt;&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;Quickest path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/kaiohenricunha/metrics-aggregator/main/examples/deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo is here: &lt;a href="https://github.com/kaiohenricunha/metrics-aggregator" rel="noopener noreferrer"&gt;kaiohenricunha/metrics-aggregator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're on sidecar mode with STRICT mTLS and wondering why half your metrics are missing, give it a try. And if you're planning a migration to ambient mode down the road but need something that works today, this bridges the gap. Open an issue if something doesn't work or if you have a use case I haven't thought of.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update: I wrote a follow-up post exploring the broader question of whether Istio should extend metrics merging or sunset it entirely: &lt;a href="https://medium.com/@kaiohsdc/istios-metrics-merging-was-built-for-a-simpler-world-what-should-replace-it-585b285fbc32" rel="noopener noreferrer"&gt;Istio's metrics merging was built for a simpler world. What should replace it?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>prometheus</category>
      <category>kubernetes</category>
      <category>istio</category>
      <category>observability</category>
    </item>
  </channel>
</rss>
