Forem: MAXX

The YAML bug that taught me what bidirectional sync between Claude Code and Codex actually costs

MAXX — Tue, 12 May 2026 04:53:41 +0000

The Codex agent had no name

The sync ran clean. Exit code 0. The skill file I'd authored on the Claude side showed up at the matching path on the Codex side, byte-for-byte where I expected it. I opened the Codex agent picker and the entry was there, but its name was the empty string. Just a blank row.

I assumed I'd mis-named something on disk. I hadn't. The file had name: code-review at the top, in plain YAML frontmatter, exactly the way Claude writes it. The string was right there. The Codex parser was just refusing to see it.

The culprit was one line three rows down: globs: **/*.{js,ts}. Claude's YAML loader is lenient. It reads the value as a string and moves on. Codex uses a strict YAML 1.2 parser, which sees the leading * as an alias anchor, fails to parse the scalar, and silently drops the entire frontmatter block. Every field, including name, goes empty.

The fix is one substitution. The file had an inline helper that decided whether to quote a frontmatter scalar:

function serializeFrontmatterScalar(value) {
  const text = String(value);
  if (/[:#"\n]/.test(text)) return JSON.stringify(text);
  return text;
}

Four characters. That regex covers four of the nineteen YAML 1.2 c-indicators. The other fifteen, including the * from my glob, sail through unquoted and the strict parser breaks. The replacement is a single shared module at bin/util/yaml-scalar.mjs and a one-line call site. The relevant rule in CLAUDE.md now reads, in parentheses, (this has actually happened). I added that parenthetical when I wrote the rule, because by then it had.

This isn't a YAML quirk story. It's what bidirectional sync between two parsers actually costs, and the cost shows up in the most boring place possible.

What "bidirectional" actually means

I keep having to explain this to people who hear "config sync" and picture a one-shot migration wizard. That isn't the shape of the problem.

Both tools are running. I edit a Claude skill on Monday, I edit a Codex MCP server config on Tuesday, and by Wednesday the two surfaces have drifted in independent directions. Neither side is the source of truth. Each side needs to keep being valid on its own parser. The job isn't "export from A, import to B." The job is "keep A and B in agreement, in both directions, on a config surface that overlaps but doesn't match."

That surface is wider than people think. Instructions (CLAUDE.md and AGENTS.md). Skills with frontmatter. MCP servers. Permissions. Hooks. Same concepts, different file paths, different vocabularies (Read and Bash on one side, spawn_agent and codex exec on the other). The CLI I built treats this as a diff problem first and a translation problem second:

ai-config-sync status
ai-config-sync sync --dry-run
ai-config-sync sync --apply

No wizard. No magic. You look at the diff, you decide.

Two parsers, one frontmatter

Here is the rule I wrote into CLAUDE.md after the bug, copied verbatim from the project doc:

Forbidden to write your own quote/escape logic. Forbidden to judge indicators directly with regex.
Reason: guarantee Claude (lenient YAML) ↔ Codex (strict YAML 1.2) round-trip. If even one site uses its own quoting, the strict parser fails to parse the entire frontmatter and fields like name go missing (this has actually happened).

That rule reads strict because it has to. The whole reason the bug existed was that the file had grown a one-off serializeFrontmatterScalar function tucked next to a frontmatter writer, and nobody (including me) noticed it didn't cover the full c-indicator set. The next time someone, possibly an AI assistant editing the file, reaches for the same convenience, the rule needs to stop them.

The shared utility lives at bin/util/yaml-scalar.mjs. Forty-two lines. Two exported functions. Here are the three regexes that do the work:

const RESERVED_INDICATOR_PREFIX = /^[-?:,[\]{}#&*!|>'"%@`]/;
const YAML_BOOL_NULL = /^(?:null|Null|NULL|~|true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO|on|On|ON|off|Off|OFF|y|Y|n|N)$/;
const YAML_TIMESTAMP = /^\d{4}-\d{2}-\d{2}(?:[Tt ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}(?::?\d{2})?)?)?$/;

The first one is the c-indicator set: nineteen characters, every one of them a structural sentinel in YAML 1.2. That's the regex the old four-character version was a draft of. The second covers YAML 1.1 boolean and null coercion, including the single-letter forms y, Y, n, N that strict parsers will quietly turn into true/false if you leave them bare. The third catches ISO timestamps, because 2014-12-31 as an unquoted scalar becomes a Date object in some parsers and a string in others, and that disagreement is its own class of bug.

The trigger frontmatter for the original bug was this:

---
name: code-review
globs: **/*.{js,ts}
---

Claude reads it. Codex sees **, classifies * as an alias anchor, and discards the block. After the fix, the writer wraps **/*.{js,ts} in double quotes and both parsers agree the value is a string.

The split between the two exported function names matters to me. yamlScalarRequiresQuoting answers a yes/no question. serializeYamlScalar does the wrap. Callers can query without serializing, which is how tests/yaml-scalar.test.mjs gets to assert directly on the predicate. One of the test cases there is the https://x case: a colon followed by a slash (not whitespace) stays plain-safe per YAML 1.2, and the utility correctly does not over-quote it. That's the kind of edge the four-character regex never knew it was missing.

One 8,803-line file

Yes, 8,803 lines in one file is a smell by every standard rubric I'd apply to someone else's code. If a junior eng handed me this in review I would ask, with feeling, why.

Here's what's in it: a diff engine, a plan/apply loop, the paraphrase engine that rewrites Claude-strict tokens into Codex-strict tokens and back, a TOML patcher for ~/.codex/config.toml, a connect installer that wires the host-launcher into either side's plugin path, a reference generator, an agent mapper, a rule loader. All cross-cutting. All sharing state. All built on top of zero external runtime dependencies, which is the constraint that holds the rest of the structure.

That constraint is load-bearing. The full import surface of the main file is node:fs, node:child_process, node:crypto, node:os, node:path, node:readline, node:url, plus the one internal util. There's no bundler in the build. npm run build:dist copies files and injects a thin launcher; it doesn't transpile or bundle anything. Splitting the monolith into ten files means either (a) shipping ten files in the npm package and managing the import graph by hand across them, or (b) adding a bundler, which adds a devDependency and a build step that has to keep producing byte-identical output. The single file makes install trivially flat, the audit surface trivially small, and the diff against any prior version trivially readable.

That's the frame. Now back to why two parsers disagreeing is the part that actually bites.

bin/util/yaml-scalar.mjs is the first extraction. Forty-two lines, two functions, the smallest possible first step out of the monolith. CLAUDE.md says it directly: Splitting bin/ai-config-sync.mjs is on hold. That's not "we'll get to it later." That's "the next extraction has to earn its own file, the way the YAML one did, by being a real shared contract."

The honest cost: parts of the TOML and rule parser are regex-based and hand-rolled. The architecture notes in .claude/docs/repo-analysis/ already flag this. Regex parsing of mostly-structured input works until it doesn t, and when it doesn t, the failure mode looks exactly like the **/*.{js,ts} story above. That's the price of "zero deps" applied to parsing, and I'm paying it knowingly. If I had to guess where the next yaml-scalar.mjs-shaped extraction comes from, it's the TOML side, and it'll be because some real config in the wild produces a parse mismatch I didn't anticipate. The trigger for the YAML extraction was a real bug, not a refactoring mood. I expect the next one to arrive the same way.

I might be wrong about all of this. There's a version of the project where the monolith gets split into a dozen files now, before any more cross-cutting state piles up, and that version is probably easier to onboard contributors to. But it would buy that ease by adding a bundler or a multi-file ship, and neither of those changes makes the YAML bug go away. The bug was in the contract between two parsers, not in the file layout.

Where this lands

The project is v0.1.0. Two known debts are named explicitly in the repo's own architecture notes: the monolith hasn't been split, and the Codex plugin installer has been corrected several times because the plugin spec on that side is still moving. Neither one is solved. Both are the kind of debt you accept when you're trying to ship a usable CLI against two tools that are themselves changing under you.

What the YAML fix actually showed me is that the hard part of bidirectional sync is not the diffing. The diffing is the easy part. The hard part is making serialized output survive both parsers without either one quietly eating a field. bin/util/yaml-scalar.mjs isn't a refactor of old code. It's the first shared contract between the two parse environments, written down as forty-two lines that both sides have to agree on. Every future extraction from the monolith will probably look like that one: small, ugly, named for the bug that made it necessary.

The first extraction took 42 lines. The next one will probably take longer.

https://github.com/slash9494/ai-config-sync-manager

Stop hand-syncing Claude Code and Codex configs

MAXX — Fri, 08 May 2026 09:18:57 +0000

If you've spent any time in Claude Code or Codex, your config tree is probably substantial:

~/.claude/{settings.json, agents/, skills/, .mcp.json, CLAUDE.md}
~/.codex/{config.toml, AGENTS.md} plus ~/.agents/skills/

Two situations get painful fast.

Switching tools. You curated skills, agents, MCP servers, and permissions in one. Now you want to try the other. Migrating by hand is slow and easy to get wrong, and the two hosts don't share shapes. Permissions are allow/deny/ask lists in Claude, but sandbox_mode plus web_search plus prefix_rule in Codex. Agents are YAML frontmatter in Claude, TOML fields in Codex.

Using both. Same skills, same agents, same MCP. Two trees. They drift. Adding anything means two edits, and you forget which side is canonical.

ai-config-sync-manager bridges the two. It diffs both sides, plans changes, applies them with a backup.

The translation is host-aware:

Claude tool permissions to Codex sandbox modes (Bash stays out of read-only sandboxes by default)
Agent frontmatter (YAML) to Codex agent fields (TOML), including model alias mapping
MCP servers both ways, including remote MCP with bearer-token env vars
Skills as folders, not loose files
Vocabulary mismatches that can't auto-translate get paraphrase overrides instead of silent corruption

Quick start:

npx ai-config-sync connect       # register the plugin in both hosts
npx ai-config-sync status        # see what's out of sync
npx ai-config-sync sync          # dry-run plan
npx ai-config-sync sync --apply  # apply with backup

What you get:

Zero runtime deps, single Node ESM CLI
6 sync areas: instructions, skills, agents, mcp, permissions, hooks
Backups capped at 30 (FIFO), state-versioned for safe rollback
Plugin for both hosts so the CLI runs from inside either tool

Repo: https://github.com/slash9494/ai-config-sync-manager

Just hit 0.1.0. Issues and migration stories welcome.

The Unfiltered Log of Shipping Open-Source v2 with AI Agents

MAXX — Sat, 18 Apr 2026 06:37:51 +0000

Six weeks, 146 commits, and every hallucination along the way

Sometime in 2024 I opened a GitHub notification from react-modern-audio-player, read the issue, started typing a reply, and closed the tab. I don't remember the issue. I remember the tab closing. That was the pattern for three years: someone would file a bug or ask about accessibility, and I would care about it for about ninety seconds before the weight of everything the library needed crushed the motivation to start.

The last real commit was v1.4.0-rc.2, February 2023. I had a full-time job, the library worked well enough, and nobody was furious enough to fork it. So it sat.

What changed

On March 1, 2026, I made the first commit of what became v2. The catalyst wasn't inspiration. It was Claude Code.

Not because Claude wrote the library for me. Because it lowered the activation energy enough that I could start. When the gap between "I should fix this" and "here is a concrete first step" shrinks from two hours of re-reading your own code to fifteen minutes of conversation, starting stops feeling impossible. That's the only honest way I can describe what happened.

The overhaul ran about six weeks. 146 commits, roughly 20 PRs numbered #31 through #53, a complete rewrite of the test infrastructure, the CSS layer, the bundle composition, and the accessibility surface. My primary tools were Claude (Opus and Sonnet) for code generation, refactoring, and review, and CodeRabbit for automated PR review. CodeRabbit runs a multi-model ensemble under the hood, selecting different frontier models per review task. I also consulted Gemini and GPT occasionally for documentation references and terminology checks, though they weren't part of the daily workflow.

Early on, I ran an experiment that changed how I worked with all of them.

Four models, four confessions

On 2026-03-18, I gave the same PR review task to four models side by side: Claude Sonnet, Gemini 3 Flash, GPT 5.3, and CodeRabbit. Every model got something wrong. Here is each one admitting it, in their own words:

Primary (daily drivers):

Model	Role	Self-admission
Claude Sonnet	Config file review	"I explained existing file content as if valid without verifying."
CodeRabbit (multi-model)	PR blocker	"My original blocker was wrong — your config is correct."

Reference (doc checks, terminology):

Model	Role	Self-admission
Gemini 3 Flash	Doc consistency	"I mixed past data with current documentation."
GPT 5.3	Bundle config	"I confused 'condition bundle' vs 'rule array' concept."

On any given question, roughly two of four got it right. But the two that got it right rotated. Claude would nail a state management question and hallucinate about a Vite config. That experiment killed my trust in any single model's output. From that point on, my rule was: if Claude and CodeRabbit agree and the official docs confirm, proceed. Otherwise, run the code and find out. The validation stack wasn't overhead. It was the actual product of the overhaul.

What AI caught that I missed

The initial analysis of v1.4.0-rc.2 produced a scorecard I already half-knew but had never written down:

Category	Score	Note
Functionality	9/10	Feature-complete
Reliability	4/10	0% test coverage
Performance	6/10	Degrades at scale
Accessibility	1/10	No WCAG support
Maintainability	5/10	Technical debt
Bundle Size	5/10	~380 KB, 6 runtime deps

Overall production-readiness: 5/10. I had shipped a feature-complete component that was nearly unusable for screen-reader users and had no safety net against regressions.

PR #41 addressed the accessibility gaps across player components. PR #42 split the React context and added memoization to fix a re-render storm I'd been ignoring. PR #43 replaced direct DOM mutations with React state. These weren't AI ideas, exactly. They were problems I already knew about, surfaced and organized by models that could scan the whole codebase in seconds instead of the hours it would have taken me to re-orient after three years away.

The validation stack

Here is what I actually ran before tagging v2.0.0:

CodeRabbit on every PR, configured to block merge on unresolved findings
32 test files (up from zero): unit tests via Vitest, integration with React Testing Library, end-to-end via Playwright
axe-core for automated accessibility checks
Manual VoiceOver testing on Safari
A docs-first workflow where I wrote the README change before the implementation
Cross-validation: no architectural decision accepted unless Claude and CodeRabbit agreed and official documentation confirmed

That last one sounds paranoid. It saved me from shipping hallucinated configs at least three times.

Is it more robust now?

Metric	v1.4.0 (Before)	v2.1.0 (After)
Tests	0 files	32 files (unit + integration + e2e)
Bundle	~380 KB, 6 runtime deps	~79 KB unminified, 1 dep (wavesurfer.js)
Accessibility	1/10, no ARIA	ARIA + keyboard + VoiceOver tested
Re-renders	All consumers on any state change	Split context + memoization
DOM control	Direct manipulation outside React	React state driven
Public API	None	`useAudioPlayer()` hook

The bundle number looks like roughly 80% smaller, but I haven't verified against gzipped production builds, so treat it as approximate.

I'm not going to claim a specific new accessibility score because I haven't run a formal audit against v2.1.0 yet. The keyboard interface covers play, pause, seek, and volume, and VoiceOver navigation works. Whether that's a 7 or an 8, I honestly don't know.

So yes, more robust. Also still a small library maintained by one person with a day job. The tests exist now, which means regressions will be caught. That is the win. Not "production-grade enterprise audio solution." Just: maintained.

"Just vibe-code your own player"

I've seen the argument. Why depend on a library at all when you can prompt an AI to generate a custom audio player in an afternoon? I've thought about it more than I'd like to admit, because if the answer is "no reason," then six weeks of my life just evaporated.

Here is what I think. A generated audio player works on the demo. Then you discover that Safari fires canplaythrough differently than Chrome, and your loading state breaks on iOS. Then you add wavesurfer.js for waveform rendering and find out its lifecycle hooks need careful cleanup or you leak memory on every track change. Then a user wants shuffle, repeat, and drag-to-reorder in the playlist, and suddenly you're maintaining a state machine. Then someone files an accessibility issue and you realize that aria attributes alone don't make a screen reader experience. Then you deploy on Next.js App Router and learn that half your hooks assume a browser environment.

Each one of these is a week. Not because any single problem is hard, but because they compound, and the generated version hasn't paid that tax yet. The question isn't whether you can build a player from scratch. Of course you can. The question is whether you want to maintain one from scratch, because that's what the six weeks actually bought: not a player, but the accumulated scar tissue of integration problems already solved, tested, and documented. Whether you'd rather spend six weeks or use a library that already did, I'll leave to you.

What I'm left with

react-modern-audio-player v2.1.0 shipped on April 14, 2026. It is still a small library. It is maintained now, which is more than I could say for three years. If you use it and something breaks, file an issue and I'll see it. I won't close the tab this time.

I don't know if AI-assisted maintenance scales to larger projects or longer timelines. I know it worked for this one, this time, with constant supervision. That's a narrower claim than I wanted to make, but it's the one the evidence supports.

P.S. This article was drafted with AI assistance (Claude) and then edited by hand. All metrics, commit references, and timeline claims were verified against the actual git history and project documentation.

Repo: https://github.com/slash9494/react-modern-audio-player