<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: MAXX</title>
    <description>The latest articles on Forem by MAXX (@maxx_l).</description>
    <link>https://forem.com/maxx_l</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885573%2F35e9db91-5c2a-40b8-8784-4167959c804b.png</url>
      <title>Forem: MAXX</title>
      <link>https://forem.com/maxx_l</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/maxx_l"/>
    <language>en</language>
    <item>
      <title>The YAML bug that taught me what bidirectional sync between Claude Code and Codex actually costs</title>
      <dc:creator>MAXX</dc:creator>
      <pubDate>Tue, 12 May 2026 04:53:41 +0000</pubDate>
      <link>https://forem.com/maxx_l/the-yaml-bug-that-taught-me-what-bidirectional-sync-between-claude-code-and-codex-actually-costs-kng</link>
      <guid>https://forem.com/maxx_l/the-yaml-bug-that-taught-me-what-bidirectional-sync-between-claude-code-and-codex-actually-costs-kng</guid>
      <description>&lt;h2&gt;
  
  
  The Codex agent had no name
&lt;/h2&gt;

&lt;p&gt;The sync ran clean. Exit code 0. The skill file I'd authored on the Claude side showed up at the matching path on the Codex side, byte-for-byte where I expected it. I opened the Codex agent picker and the entry was there, but its name was the empty string. Just a blank row.&lt;/p&gt;

&lt;p&gt;I assumed I'd mis-named something on disk. I hadn't. The file had &lt;code&gt;name: code-review&lt;/code&gt; at the top, in plain YAML frontmatter, exactly the way Claude writes it. The string was right there. The Codex parser was just refusing to see it.&lt;/p&gt;

&lt;p&gt;The culprit was one line three rows down: &lt;code&gt;globs: **/*.{js,ts}&lt;/code&gt;. Claude's YAML loader is lenient. It reads the value as a string and moves on. Codex uses a strict YAML 1.2 parser, which sees the leading &lt;code&gt;*&lt;/code&gt; as an alias anchor, fails to parse the scalar, and silently drops the entire frontmatter block. Every field, including &lt;code&gt;name&lt;/code&gt;, goes empty.&lt;/p&gt;

&lt;p&gt;The fix is one substitution. The file had an inline helper that decided whether to quote a frontmatter scalar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;serializeFrontmatterScalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;:#"&lt;/span&gt;&lt;span class="se"&gt;\n]&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four characters. That regex covers four of the nineteen YAML 1.2 c-indicators. The other fifteen, including the &lt;code&gt;*&lt;/code&gt; from my glob, sail through unquoted and the strict parser breaks. The replacement is a single shared module at &lt;code&gt;bin/util/yaml-scalar.mjs&lt;/code&gt; and a one-line call site. The relevant rule in &lt;code&gt;CLAUDE.md&lt;/code&gt; now reads, in parentheses, &lt;strong&gt;(this has actually happened)&lt;/strong&gt;. I added that parenthetical when I wrote the rule, because by then it had.&lt;/p&gt;

&lt;p&gt;This isn't a YAML quirk story. It's what bidirectional sync between two parsers actually costs, and the cost shows up in the most boring place possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "bidirectional" actually means
&lt;/h2&gt;

&lt;p&gt;I keep having to explain this to people who hear "config sync" and picture a one-shot migration wizard. That isn't the shape of the problem.&lt;/p&gt;

&lt;p&gt;Both tools are running. I edit a Claude skill on Monday, I edit a Codex MCP server config on Tuesday, and by Wednesday the two surfaces have drifted in independent directions. Neither side is the source of truth. Each side needs to keep being valid on its own parser. The job isn't "export from A, import to B." The job is "keep A and B in agreement, in both directions, on a config surface that overlaps but doesn't match."&lt;/p&gt;

&lt;p&gt;That surface is wider than people think. Instructions (&lt;code&gt;CLAUDE.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt;). Skills with frontmatter. MCP servers. Permissions. Hooks. Same concepts, different file paths, different vocabularies (&lt;code&gt;Read&lt;/code&gt; and &lt;code&gt;Bash&lt;/code&gt; on one side, &lt;code&gt;spawn_agent&lt;/code&gt; and &lt;code&gt;codex exec&lt;/code&gt; on the other). The CLI I built treats this as a diff problem first and a translation problem second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ai-config-sync status
ai-config-sync &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
ai-config-sync &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--apply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No wizard. No magic. You look at the diff, you decide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two parsers, one frontmatter
&lt;/h2&gt;

&lt;p&gt;Here is the rule I wrote into &lt;code&gt;CLAUDE.md&lt;/code&gt; after the bug, copied verbatim from the project doc:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Forbidden&lt;/strong&gt; to write your own quote/escape logic. &lt;strong&gt;Forbidden&lt;/strong&gt; to judge indicators directly with regex.&lt;br&gt;
Reason: guarantee Claude (lenient YAML) ↔ Codex (strict YAML 1.2) round-trip. If even one site uses its own quoting, the strict parser fails to parse the entire frontmatter and fields like &lt;code&gt;name&lt;/code&gt; go missing (this has actually happened).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That rule reads strict because it has to. The whole reason the bug existed was that the file had grown a one-off &lt;code&gt;serializeFrontmatterScalar&lt;/code&gt; function tucked next to a frontmatter writer, and nobody (including me) noticed it didn't cover the full c-indicator set. The next time someone, possibly an AI assistant editing the file, reaches for the same convenience, the rule needs to stop them.&lt;/p&gt;

&lt;p&gt;The shared utility lives at &lt;code&gt;bin/util/yaml-scalar.mjs&lt;/code&gt;. Forty-two lines. Two exported functions. Here are the three regexes that do the work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;RESERVED_INDICATOR_PREFIX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-?:,[&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;&lt;span class="sr"&gt;{}#&amp;amp;*!|&amp;gt;'"%@`&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;YAML_BOOL_NULL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;null|Null|NULL|~|true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO|on|On|ON|off|Off|OFF|y|Y|n|N&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;YAML_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\d{4}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}(?:[&lt;/span&gt;&lt;span class="sr"&gt;Tt &lt;/span&gt;&lt;span class="se"&gt;]\d{2}&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\d{2}(?:\.\d&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)?(?:&lt;/span&gt;&lt;span class="sr"&gt;Z|&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;+-&lt;/span&gt;&lt;span class="se"&gt;]\d{2}(?:&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;?\d{2})?)?)?&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first one is the c-indicator set: nineteen characters, every one of them a structural sentinel in YAML 1.2. That's the regex the old four-character version was a draft of. The second covers YAML 1.1 boolean and null coercion, including the single-letter forms &lt;code&gt;y&lt;/code&gt;, &lt;code&gt;Y&lt;/code&gt;, &lt;code&gt;n&lt;/code&gt;, &lt;code&gt;N&lt;/code&gt; that strict parsers will quietly turn into &lt;code&gt;true&lt;/code&gt;/&lt;code&gt;false&lt;/code&gt; if you leave them bare. The third catches ISO timestamps, because &lt;code&gt;2014-12-31&lt;/code&gt; as an unquoted scalar becomes a Date object in some parsers and a string in others, and that disagreement is its own class of bug.&lt;/p&gt;

&lt;p&gt;The trigger frontmatter for the original bug was this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;code-review&lt;/span&gt;
&lt;span class="na"&gt;globs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="err"&gt;**&lt;/span&gt;&lt;span class="s"&gt;/*.{js,ts}&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude reads it. Codex sees &lt;code&gt;**&lt;/code&gt;, classifies &lt;code&gt;*&lt;/code&gt; as an alias anchor, and discards the block. After the fix, the writer wraps &lt;code&gt;**/*.{js,ts}&lt;/code&gt; in double quotes and both parsers agree the value is a string.&lt;/p&gt;

&lt;p&gt;The split between the two exported function names matters to me. &lt;code&gt;yamlScalarRequiresQuoting&lt;/code&gt; answers a yes/no question. &lt;code&gt;serializeYamlScalar&lt;/code&gt; does the wrap. Callers can query without serializing, which is how &lt;code&gt;tests/yaml-scalar.test.mjs&lt;/code&gt; gets to assert directly on the predicate. One of the test cases there is the &lt;code&gt;https://x&lt;/code&gt; case: a colon followed by a slash (not whitespace) stays plain-safe per YAML 1.2, and the utility correctly does not over-quote it. That's the kind of edge the four-character regex never knew it was missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  One 8,803-line file
&lt;/h2&gt;

&lt;p&gt;Yes, 8,803 lines in one file is a smell by every standard rubric I'd apply to someone else's code. If a junior eng handed me this in review I would ask, with feeling, why.&lt;/p&gt;

&lt;p&gt;Here's what's in it: a diff engine, a plan/apply loop, the paraphrase engine that rewrites Claude-strict tokens into Codex-strict tokens and back, a TOML patcher for &lt;code&gt;~/.codex/config.toml&lt;/code&gt;, a connect installer that wires the host-launcher into either side's plugin path, a reference generator, an agent mapper, a rule loader. All cross-cutting. All sharing state. All built on top of zero external runtime dependencies, which is the constraint that holds the rest of the structure.&lt;/p&gt;

&lt;p&gt;That constraint is load-bearing. The full import surface of the main file is &lt;code&gt;node:fs&lt;/code&gt;, &lt;code&gt;node:child_process&lt;/code&gt;, &lt;code&gt;node:crypto&lt;/code&gt;, &lt;code&gt;node:os&lt;/code&gt;, &lt;code&gt;node:path&lt;/code&gt;, &lt;code&gt;node:readline&lt;/code&gt;, &lt;code&gt;node:url&lt;/code&gt;, plus the one internal util. There's no bundler in the build. &lt;code&gt;npm run build:dist&lt;/code&gt; copies files and injects a thin launcher; it doesn't transpile or bundle anything. Splitting the monolith into ten files means either (a) shipping ten files in the npm package and managing the import graph by hand across them, or (b) adding a bundler, which adds a devDependency and a build step that has to keep producing byte-identical output. The single file makes install trivially flat, the audit surface trivially small, and the diff against any prior version trivially readable.&lt;/p&gt;

&lt;p&gt;That's the frame. Now back to why two parsers disagreeing is the part that actually bites.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bin/util/yaml-scalar.mjs&lt;/code&gt; is the first extraction. Forty-two lines, two functions, the smallest possible first step out of the monolith. &lt;code&gt;CLAUDE.md&lt;/code&gt; says it directly: &lt;code&gt;Splitting bin/ai-config-sync.mjs is on hold.&lt;/code&gt; That's not "we'll get to it later." That's "the next extraction has to earn its own file, the way the YAML one did, by being a real shared contract."&lt;/p&gt;

&lt;p&gt;The honest cost: parts of the TOML and rule parser are regex-based and hand-rolled. The architecture notes in &lt;code&gt;.claude/docs/repo-analysis/&lt;/code&gt; already flag this. Regex parsing of mostly-structured input works until it doesn t, and when it doesn t, the failure mode looks exactly like the &lt;code&gt;**/*.{js,ts}&lt;/code&gt; story above. That's the price of "zero deps" applied to parsing, and I'm paying it knowingly. If I had to guess where the next &lt;code&gt;yaml-scalar.mjs&lt;/code&gt;-shaped extraction comes from, it's the TOML side, and it'll be because some real config in the wild produces a parse mismatch I didn't anticipate. The trigger for the YAML extraction was a real bug, not a refactoring mood. I expect the next one to arrive the same way.&lt;/p&gt;

&lt;p&gt;I might be wrong about all of this. There's a version of the project where the monolith gets split into a dozen files now, before any more cross-cutting state piles up, and that version is probably easier to onboard contributors to. But it would buy that ease by adding a bundler or a multi-file ship, and neither of those changes makes the YAML bug go away. The bug was in the contract between two parsers, not in the file layout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this lands
&lt;/h2&gt;

&lt;p&gt;The project is v0.1.0. Two known debts are named explicitly in the repo's own architecture notes: the monolith hasn't been split, and the Codex plugin installer has been corrected several times because the plugin spec on that side is still moving. Neither one is solved. Both are the kind of debt you accept when you're trying to ship a usable CLI against two tools that are themselves changing under you.&lt;/p&gt;

&lt;p&gt;What the YAML fix actually showed me is that the hard part of bidirectional sync is not the diffing. The diffing is the easy part. The hard part is making serialized output survive both parsers without either one quietly eating a field. &lt;code&gt;bin/util/yaml-scalar.mjs&lt;/code&gt; isn't a refactor of old code. It's the first shared contract between the two parse environments, written down as forty-two lines that both sides have to agree on. Every future extraction from the monolith will probably look like that one: small, ugly, named for the bug that made it necessary.&lt;/p&gt;

&lt;p&gt;The first extraction took 42 lines. The next one will probably take longer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/slash9494/ai-config-sync-manager" rel="noopener noreferrer"&gt;https://github.com/slash9494/ai-config-sync-manager&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>openai</category>
      <category>opensource</category>
      <category>cli</category>
    </item>
    <item>
      <title>Stop hand-syncing Claude Code and Codex configs</title>
      <dc:creator>MAXX</dc:creator>
      <pubDate>Fri, 08 May 2026 09:18:57 +0000</pubDate>
      <link>https://forem.com/maxx_l/stop-hand-syncing-claude-code-and-codex-configs-enk</link>
      <guid>https://forem.com/maxx_l/stop-hand-syncing-claude-code-and-codex-configs-enk</guid>
      <description>&lt;p&gt;If you've spent any time in Claude Code or Codex, your config tree is probably substantial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~/.claude/{settings.json, agents/, skills/, .mcp.json, CLAUDE.md}&lt;/li&gt;
&lt;li&gt;~/.codex/{config.toml, AGENTS.md} plus ~/.agents/skills/&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two situations get painful fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Switching tools.&lt;/strong&gt; You curated skills, agents, MCP servers, and permissions in one. Now you want to try the other. Migrating by hand is slow and easy to get wrong, and the two hosts don't share shapes. Permissions are allow/deny/ask lists in Claude, but sandbox_mode plus web_search plus prefix_rule in Codex. Agents are YAML frontmatter in Claude, TOML fields in Codex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using both.&lt;/strong&gt; Same skills, same agents, same MCP. Two trees. They drift. Adding anything means two edits, and you forget which side is canonical.&lt;/p&gt;

&lt;p&gt;ai-config-sync-manager bridges the two. It diffs both sides, plans changes, applies them with a backup.&lt;/p&gt;

&lt;p&gt;The translation is host-aware:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude tool permissions to Codex sandbox modes (Bash stays out of read-only sandboxes by default)&lt;/li&gt;
&lt;li&gt;Agent frontmatter (YAML) to Codex agent fields (TOML), including model alias mapping&lt;/li&gt;
&lt;li&gt;MCP servers both ways, including remote MCP with bearer-token env vars&lt;/li&gt;
&lt;li&gt;Skills as folders, not loose files&lt;/li&gt;
&lt;li&gt;Vocabulary mismatches that can't auto-translate get paraphrase overrides instead of silent corruption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx ai-config-sync connect       &lt;span class="c"&gt;# register the plugin in both hosts&lt;/span&gt;
npx ai-config-sync status        &lt;span class="c"&gt;# see what's out of sync&lt;/span&gt;
npx ai-config-sync &lt;span class="nb"&gt;sync&lt;/span&gt;          &lt;span class="c"&gt;# dry-run plan&lt;/span&gt;
npx ai-config-sync &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--apply&lt;/span&gt;  &lt;span class="c"&gt;# apply with backup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero runtime deps, single Node ESM CLI&lt;/li&gt;
&lt;li&gt;6 sync areas: instructions, skills, agents, mcp, permissions, hooks&lt;/li&gt;
&lt;li&gt;Backups capped at 30 (FIFO), state-versioned for safe rollback&lt;/li&gt;
&lt;li&gt;Plugin for both hosts so the CLI runs from inside either tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/slash9494/ai-config-sync-manager" rel="noopener noreferrer"&gt;https://github.com/slash9494/ai-config-sync-manager&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just hit 0.1.0. Issues and migration stories welcome.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>openai</category>
      <category>cli</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Unfiltered Log of Shipping Open-Source v2 with AI Agents</title>
      <dc:creator>MAXX</dc:creator>
      <pubDate>Sat, 18 Apr 2026 06:37:51 +0000</pubDate>
      <link>https://forem.com/maxx_l/the-unfiltered-log-of-shipping-open-source-v2-with-ai-agents-3i0o</link>
      <guid>https://forem.com/maxx_l/the-unfiltered-log-of-shipping-open-source-v2-with-ai-agents-3i0o</guid>
      <description>&lt;p&gt;&lt;em&gt;Six weeks, 146 commits, and every hallucination along the way&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Sometime in 2024 I opened a GitHub notification from &lt;a href="https://github.com/slash9494/react-modern-audio-player" rel="noopener noreferrer"&gt;&lt;code&gt;react-modern-audio-player&lt;/code&gt;&lt;/a&gt;, read the issue, started typing a reply, and closed the tab. I don't remember the issue. I remember the tab closing. That was the pattern for three years: someone would file a bug or ask about accessibility, and I would care about it for about ninety seconds before the weight of everything the library needed crushed the motivation to start.&lt;/p&gt;

&lt;p&gt;The last real commit was v1.4.0-rc.2, February 2023. I had a full-time job, the library worked well enough, and nobody was furious enough to fork it. So it sat.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;On March 1, 2026, I made the first commit of what became v2. The catalyst wasn't inspiration. It was Claude Code.&lt;/p&gt;

&lt;p&gt;Not because Claude wrote the library for me. Because it lowered the activation energy enough that I could start. When the gap between "I should fix this" and "here is a concrete first step" shrinks from two hours of re-reading your own code to fifteen minutes of conversation, starting stops feeling impossible. That's the only honest way I can describe what happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8cojuqrp9cb1zuxuq6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8cojuqrp9cb1zuxuq6h.png" alt="react-modern-audio-player waveform with progress" width="800" height="51"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The overhaul ran about six weeks. 146 commits, roughly 20 PRs numbered #31 through #53, a complete rewrite of the test infrastructure, the CSS layer, the bundle composition, and the accessibility surface. My primary tools were Claude (Opus and Sonnet) for code generation, refactoring, and review, and CodeRabbit for automated PR review. CodeRabbit runs a multi-model ensemble under the hood, selecting different frontier models per review task. I also consulted Gemini and GPT occasionally for documentation references and terminology checks, though they weren't part of the daily workflow.&lt;/p&gt;

&lt;p&gt;Early on, I ran an experiment that changed how I worked with all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four models, four confessions
&lt;/h2&gt;

&lt;p&gt;On 2026-03-18, I gave the same PR review task to four models side by side: Claude Sonnet, Gemini 3 Flash, GPT 5.3, and CodeRabbit. Every model got something wrong. Here is each one admitting it, in their own words:&lt;/p&gt;

&lt;p&gt;Primary (daily drivers):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Self-admission&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet&lt;/td&gt;
&lt;td&gt;Config file review&lt;/td&gt;
&lt;td&gt;"I explained existing file content as if valid without verifying."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CodeRabbit (multi-model)&lt;/td&gt;
&lt;td&gt;PR blocker&lt;/td&gt;
&lt;td&gt;"My original blocker was wrong — your config is correct."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reference (doc checks, terminology):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Self-admission&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;Doc consistency&lt;/td&gt;
&lt;td&gt;"I mixed past data with current documentation."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT 5.3&lt;/td&gt;
&lt;td&gt;Bundle config&lt;/td&gt;
&lt;td&gt;"I confused 'condition bundle' vs 'rule array' concept."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy6xawjkw8hnpcaij18j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy6xawjkw8hnpcaij18j.png" alt="CodeRabbit admitting its blocker was wrong after user pushback (PR #32)" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On any given question, roughly two of four got it right. But the two that got it right rotated. Claude would nail a state management question and hallucinate about a Vite config. That experiment killed my trust in any single model's output. From that point on, my rule was: if Claude and CodeRabbit agree and the official docs confirm, proceed. Otherwise, run the code and find out. The validation stack wasn't overhead. It was the actual product of the overhaul.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI caught that I missed
&lt;/h2&gt;

&lt;p&gt;The initial analysis of v1.4.0-rc.2 produced a scorecard I already half-knew but had never written down:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Functionality&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;Feature-complete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;4/10&lt;/td&gt;
&lt;td&gt;0% test coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;6/10&lt;/td&gt;
&lt;td&gt;Degrades at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessibility&lt;/td&gt;
&lt;td&gt;1/10&lt;/td&gt;
&lt;td&gt;No WCAG support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintainability&lt;/td&gt;
&lt;td&gt;5/10&lt;/td&gt;
&lt;td&gt;Technical debt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundle Size&lt;/td&gt;
&lt;td&gt;5/10&lt;/td&gt;
&lt;td&gt;~380 KB, 6 runtime deps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Overall production-readiness: 5/10. I had shipped a feature-complete component that was nearly unusable for screen-reader users and had no safety net against regressions.&lt;/p&gt;

&lt;p&gt;PR #41 addressed the accessibility gaps across player components. PR #42 split the React context and added memoization to fix a re-render storm I'd been ignoring. PR #43 replaced direct DOM mutations with React state. These weren't AI ideas, exactly. They were problems I already knew about, surfaced and organized by models that could scan the whole codebase in seconds instead of the hours it would have taken me to re-orient after three years away.&lt;/p&gt;

&lt;h2&gt;
  
  
  The validation stack
&lt;/h2&gt;

&lt;p&gt;Here is what I actually ran before tagging v2.0.0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CodeRabbit on every PR, configured to block merge on unresolved findings&lt;/li&gt;
&lt;li&gt;32 test files (up from zero): unit tests via Vitest, integration with React Testing Library, end-to-end via Playwright&lt;/li&gt;
&lt;li&gt;axe-core for automated accessibility checks&lt;/li&gt;
&lt;li&gt;Manual VoiceOver testing on Safari&lt;/li&gt;
&lt;li&gt;A docs-first workflow where I wrote the README change before the implementation&lt;/li&gt;
&lt;li&gt;Cross-validation: no architectural decision accepted unless Claude and CodeRabbit agreed and official documentation confirmed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one sounds paranoid. It saved me from shipping hallucinated configs at least three times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is it more robust now?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;v1.4.0 (Before)&lt;/th&gt;
&lt;th&gt;v2.1.0 (After)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;0 files&lt;/td&gt;
&lt;td&gt;32 files (unit + integration + e2e)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundle&lt;/td&gt;
&lt;td&gt;~380 KB, 6 runtime deps&lt;/td&gt;
&lt;td&gt;~79 KB unminified, 1 dep (wavesurfer.js)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessibility&lt;/td&gt;
&lt;td&gt;1/10, no ARIA&lt;/td&gt;
&lt;td&gt;ARIA + keyboard + VoiceOver tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Re-renders&lt;/td&gt;
&lt;td&gt;All consumers on any state change&lt;/td&gt;
&lt;td&gt;Split context + memoization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DOM control&lt;/td&gt;
&lt;td&gt;Direct manipulation outside React&lt;/td&gt;
&lt;td&gt;React state driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public API&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;useAudioPlayer()&lt;/code&gt; hook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ai09jh63p54d78ggh6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ai09jh63p54d78ggh6x.png" alt="Bundle size: 380 KB to 79 KB" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The bundle number looks like roughly 80% smaller, but I haven't verified against gzipped production builds, so treat it as approximate.&lt;/p&gt;

&lt;p&gt;I'm not going to claim a specific new accessibility score because I haven't run a formal audit against v2.1.0 yet. The keyboard interface covers play, pause, seek, and volume, and VoiceOver navigation works. Whether that's a 7 or an 8, I honestly don't know.&lt;/p&gt;

&lt;p&gt;So yes, more robust. Also still a small library maintained by one person with a day job. The tests exist now, which means regressions will be caught. That is the win. Not "production-grade enterprise audio solution." Just: maintained.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Just vibe-code your own player"
&lt;/h2&gt;

&lt;p&gt;I've seen the argument. Why depend on a library at all when you can prompt an AI to generate a custom audio player in an afternoon? I've thought about it more than I'd like to admit, because if the answer is "no reason," then six weeks of my life just evaporated.&lt;/p&gt;

&lt;p&gt;Here is what I think. A generated audio player works on the demo. Then you discover that Safari fires &lt;code&gt;canplaythrough&lt;/code&gt; differently than Chrome, and your loading state breaks on iOS. Then you add wavesurfer.js for waveform rendering and find out its lifecycle hooks need careful cleanup or you leak memory on every track change. Then a user wants shuffle, repeat, and drag-to-reorder in the playlist, and suddenly you're maintaining a state machine. Then someone files an accessibility issue and you realize that aria attributes alone don't make a screen reader experience. Then you deploy on Next.js App Router and learn that half your hooks assume a browser environment.&lt;/p&gt;

&lt;p&gt;Each one of these is a week. Not because any single problem is hard, but because they compound, and the generated version hasn't paid that tax yet. The question isn't whether you can build a player from scratch. Of course you can. The question is whether you want to maintain one from scratch, because that's what the six weeks actually bought: not a player, but the accumulated scar tissue of integration problems already solved, tested, and documented. Whether you'd rather spend six weeks or use a library that already did, I'll leave to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm left with
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/slash9494/react-modern-audio-player" rel="noopener noreferrer"&gt;&lt;code&gt;react-modern-audio-player&lt;/code&gt;&lt;/a&gt; v2.1.0 shipped on April 14, 2026. It is still a small library. It is maintained now, which is more than I could say for three years. If you use it and something breaks, file an issue and I'll see it. I won't close the tab this time.&lt;/p&gt;

&lt;p&gt;I don't know if AI-assisted maintenance scales to larger projects or longer timelines. I know it worked for this one, this time, with constant supervision. That's a narrower claim than I wanted to make, but it's the one the evidence supports.&lt;/p&gt;




&lt;p&gt;P.S. This article was drafted with AI assistance (Claude) and then edited by hand. All metrics, commit references, and timeline claims were verified against the actual git history and project documentation.&lt;/p&gt;




&lt;p&gt;Repo: &lt;a href="https://github.com/slash9494/react-modern-audio-player" rel="noopener noreferrer"&gt;https://github.com/slash9494/react-modern-audio-player&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>react</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
