<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Arnold Cartagena</title>
    <description>The latest articles on Forem by Arnold Cartagena (@acartag7).</description>
    <link>https://forem.com/acartag7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3750887%2F7ff1fa87-d2aa-45ca-8223-7489ffa9a496.png</url>
      <title>Forem: Arnold Cartagena</title>
      <link>https://forem.com/acartag7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/acartag7"/>
    <language>en</language>
    <item>
      <title>npm Trusted Publishing with GitHub Actions OIDC — What the Docs Don't Tell You (Scoped Packages)</title>
      <dc:creator>Arnold Cartagena</dc:creator>
      <pubDate>Thu, 26 Mar 2026 22:29:18 +0000</pubDate>
      <link>https://forem.com/acartag7/npm-trusted-publishing-with-github-actions-oidc-what-the-docs-dont-tell-you-scoped-packages-2955</link>
      <guid>https://forem.com/acartag7/npm-trusted-publishing-with-github-actions-oidc-what-the-docs-dont-tell-you-scoped-packages-2955</guid>
      <description>&lt;p&gt;After the recent npm supply chain attacks, long-lived tokens are out. Trusted publishing via OIDC is the way forward. But if you maintain &lt;strong&gt;scoped packages&lt;/strong&gt; (&lt;code&gt;@org/package&lt;/code&gt;), you're going to hit some walls the docs don't warn you about.&lt;/p&gt;

&lt;p&gt;I spent a full day getting &lt;code&gt;@edictum/openclaw&lt;/code&gt; to publish via trusted publishing from GitHub Actions. Here's everything I learned so you don't have to.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I maintain &lt;a href="https://github.com/edictum-ai/edictum" rel="noopener noreferrer"&gt;Edictum&lt;/a&gt;, a runtime contract enforcement library for AI agents. We just shipped a native &lt;a href="https://github.com/edictum-ai/edictum-openclaw" rel="noopener noreferrer"&gt;OpenClaw plugin&lt;/a&gt; and needed to publish &lt;code&gt;@edictum/openclaw&lt;/code&gt; to npm from GitHub Actions using trusted publishing — no tokens, no secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Worked
&lt;/h2&gt;

&lt;p&gt;Here's the final working workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish to npm&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;id-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;publish&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v5&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="na"&gt;registry-url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://registry.npmjs.org&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

      &lt;span class="c1"&gt;# THIS IS CRITICAL — bundled npm doesn't support OIDC for scoped packages&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm install -g npm@latest&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm build&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm test&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm publish --provenance --access public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in &lt;code&gt;package.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"publishConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"access"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provenance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"registry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://registry.npmjs.org/"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git+https://github.com/edictum-ai/edictum-openclaw.git"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Problems (and Fixes)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Bundled npm doesn't support OIDC for scoped packages
&lt;/h3&gt;

&lt;p&gt;This is &lt;a href="https://github.com/npm/cli/issues/8678" rel="noopener noreferrer"&gt;a known issue&lt;/a&gt;. The npm version bundled with Node.js 22 on GitHub-hosted runners is too old. You &lt;strong&gt;must&lt;/strong&gt; upgrade:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm install -g npm@latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.npmjs.com/trusted-publishers" rel="noopener noreferrer"&gt;npm docs&lt;/a&gt; say you need npm 11.5.1+. Without this, you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm error code E404
npm error 404 Not Found - PUT https://registry.npmjs.org/@scope%2fpackage - Not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This E404 is misleading — the package exists, but the old npm can't do the OIDC token exchange for scoped packages.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;actions/setup-node&lt;/code&gt; injects a default token that breaks OIDC
&lt;/h3&gt;

&lt;p&gt;When you use &lt;code&gt;registry-url&lt;/code&gt; with &lt;code&gt;actions/setup-node&lt;/code&gt;, it &lt;a href="https://github.com/actions/setup-node/blob/main/src/authutil.ts#L57" rel="noopener noreferrer"&gt;automatically sets &lt;code&gt;NODE_AUTH_TOKEN&lt;/code&gt;&lt;/a&gt; to &lt;code&gt;${{ github.token }}&lt;/code&gt;. This GitHub token overrides the OIDC flow, and npm tries to authenticate with it instead of doing the OIDC exchange.&lt;/p&gt;

&lt;p&gt;Some people &lt;a href="https://github.com/orgs/community/discussions/176761" rel="noopener noreferrer"&gt;clear it with &lt;code&gt;NODE_AUTH_TOKEN: ""&lt;/code&gt;&lt;/a&gt;, but that breaks authentication entirely (ENEEDAUTH). Others remove &lt;code&gt;registry-url&lt;/code&gt;, but then npm doesn't know where to publish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix that actually works&lt;/strong&gt;: upgrade npm (step 1 above). The upgraded npm correctly handles the OIDC flow even with the injected token present. The &lt;code&gt;--provenance&lt;/code&gt; flag triggers the OIDC path explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The environment field on npmjs.com must match (or be empty)
&lt;/h3&gt;

&lt;p&gt;When configuring the trusted publisher on npmjs.com, there's an "Environment name" field. If you set this to &lt;code&gt;npm&lt;/code&gt; or &lt;code&gt;release&lt;/code&gt;, the GitHub Actions job must run in a matching &lt;a href="https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment" rel="noopener noreferrer"&gt;GitHub environment&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What tripped me up&lt;/strong&gt;: I set the environment name on npmjs.com to &lt;code&gt;npm&lt;/code&gt; and added &lt;code&gt;environment: npm&lt;/code&gt; to my workflow job. It still failed with E404. Removing the environment name from npmjs.com (leaving it blank) fixed it immediately.&lt;/p&gt;

&lt;p&gt;If you need environment-based protection (approval gates, etc.), make sure the names match &lt;strong&gt;exactly&lt;/strong&gt; — case-sensitive, no trailing spaces. But if you don't need it, leave it blank.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;--provenance&lt;/code&gt; is NOT automatic (despite what docs say)
&lt;/h3&gt;

&lt;p&gt;The npm docs state:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you publish using trusted publishing, npm automatically generates and publishes provenance attestations. You don't need to add the &lt;code&gt;--provenance&lt;/code&gt; flag.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This was not my experience. Publishing without &lt;code&gt;--provenance&lt;/code&gt; resulted in ENEEDAUTH. Adding it fixed the issue. You can also set it in &lt;code&gt;package.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"publishConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provenance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. &lt;code&gt;repository.url&lt;/code&gt; must match your GitHub repo exactly
&lt;/h3&gt;

&lt;p&gt;The trusted publisher config on npmjs.com requires your org/repo. Your &lt;code&gt;package.json&lt;/code&gt; must agree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git+https://github.com/your-org/your-repo.git"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If these don't match, npm silently rejects the publish with a 404.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntau7ti6tqvdulkbvkdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntau7ti6tqvdulkbvkdq.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Big Projects Handle This
&lt;/h2&gt;

&lt;p&gt;Curious how others do it, I checked several major open-source projects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vercel AI SDK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;NPM_TOKEN&lt;/code&gt; secret (token-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain.js&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;NPM_TOKEN&lt;/code&gt; secret + manual &lt;code&gt;.npmrc&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenClaw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OIDC trusted publishing (non-scoped package)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;shadcn/ui&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;NPM_TOKEN&lt;/code&gt; secret&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most major projects still use token-based publishing. The ones using OIDC successfully tend to be non-scoped packages. Scoped packages with OIDC are still rough — &lt;a href="https://github.com/npm/cli/issues/8976" rel="noopener noreferrer"&gt;npm/cli#8976&lt;/a&gt; is open as of today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Checklist
&lt;/h2&gt;

&lt;p&gt;If you're setting up trusted publishing for a scoped npm package from GitHub Actions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] npm 11.5.1+ (&lt;code&gt;npm install -g npm@latest&lt;/code&gt; in your workflow)&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;permissions: id-token: write&lt;/code&gt; in your workflow&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;registry-url: https://registry.npmjs.org&lt;/code&gt; in &lt;code&gt;actions/setup-node&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;--provenance&lt;/code&gt; flag on &lt;code&gt;npm publish&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;repository.url&lt;/code&gt; in &lt;code&gt;package.json&lt;/code&gt; matches your GitHub repo&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;publishConfig.access: "public"&lt;/code&gt; in &lt;code&gt;package.json&lt;/code&gt; (for scoped packages)&lt;/li&gt;
&lt;li&gt;[ ] Trusted publisher configured on npmjs.com → package → Settings&lt;/li&gt;
&lt;li&gt;[ ] Environment name on npmjs.com: &lt;strong&gt;leave blank&lt;/strong&gt; unless you specifically need it&lt;/li&gt;
&lt;li&gt;[ ] No &lt;code&gt;NODE_AUTH_TOKEN&lt;/code&gt; secret set in your repo (it would override OIDC)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;@edictum/openclaw&lt;/code&gt; is now published with OIDC trusted publishing and provenance attestations, no long-lived secrets anywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm i @edictum/openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you're an &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @edictum/openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command, zero code changes, 25 security contracts active. Check it out: &lt;a href="https://github.com/edictum-ai/edictum-openclaw" rel="noopener noreferrer"&gt;github.com/edictum-ai/edictum-openclaw&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/edictum-ai/edictum" rel="noopener noreferrer"&gt;Edictum&lt;/a&gt; is runtime contract enforcement for AI agent tool calls. Deterministic YAML contracts that execute outside the model — the LLM can't talk its way past them. Available in &lt;a href="https://github.com/edictum-ai/edictum" rel="noopener noreferrer"&gt;Python&lt;/a&gt;, &lt;a href="https://github.com/edictum-ai/edictum-ts" rel="noopener noreferrer"&gt;TypeScript&lt;/a&gt;, and &lt;a href="https://github.com/edictum-ai/edictum-go" rel="noopener noreferrer"&gt;Go&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>npm</category>
      <category>githubactions</category>
      <category>security</category>
    </item>
    <item>
      <title>AI Monoculture: When Every Engineer Has the Same Architect</title>
      <dc:creator>Arnold Cartagena</dc:creator>
      <pubDate>Mon, 16 Mar 2026 17:40:38 +0000</pubDate>
      <link>https://forem.com/acartag7/why-every-ai-built-app-looks-the-same-287b</link>
      <guid>https://forem.com/acartag7/why-every-ai-built-app-looks-the-same-287b</guid>
      <description>&lt;p&gt;&lt;em&gt;Vibe coding is technical debt at scale. The rewrite is coming.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I was asking myself this question for quite a while, it's like we all have the same Architect, so I ran an experiment across six frontier AI models to see what happens when you ask them to design software architecture versus when you ask them to help build the project.&lt;/p&gt;

&lt;p&gt;144 prompts. Four different systems. Three prompt styles.&lt;/p&gt;

&lt;p&gt;When asked to design architecture, the models recommended Rust, Aerospike, ClickHouse, event streaming systems.&lt;/p&gt;

&lt;p&gt;When asked to help build the same systems, those choices vanished.&lt;/p&gt;

&lt;p&gt;Rust dropped from 43 recommendations to zero. SQLite appeared 12 times. TypeScript jumped to 25.&lt;/p&gt;

&lt;p&gt;Same models. Same problems.&lt;/p&gt;

&lt;p&gt;The only difference was the prompt: "design it" versus "help me build it."&lt;/p&gt;

&lt;p&gt;And that difference could explain why so many AI-built apps end up with the exact same stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is already happening and most people don't see it
&lt;/h2&gt;

&lt;p&gt;You already know what this looks like. You've seen it. You might be building one right now.&lt;/p&gt;

&lt;p&gt;Next.js + Supabase. Auth? Supabase. Database? Supabase. Real-time? Supabase. File storage? Supabase. The DX is smooth, the deploy is one click, and Vercel has done genuinely excellent work making it frictionless. Credit where it's due.&lt;/p&gt;

&lt;p&gt;But when you look at a thousand AI-assisted projects and they all have the same skeleton, that's not a thousand engineers independently arriving at the same conclusion. That's one model's opinion, amplified a thousand times. It's an AI monoculture — and it's everywhere.&lt;/p&gt;

&lt;p&gt;The thing is, most people building with AI right now don't realize they're producing the same architecture as everyone else. Each project feels like a fresh start. The assistant gives you a clean scaffold, the code works, the structure looks thoughtful. It doesn't &lt;em&gt;feel&lt;/em&gt; like a default. It feels like a decision.&lt;/p&gt;

&lt;p&gt;That's the core problem. Not that the default stack is bad — it's often perfectly fine. But that thousands of engineers are adopting it without realizing it was never chosen. The AI chose it for them, based on what it generates most fluently, and presented it with enough confidence that nobody questioned it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scaffold picks your architecture
&lt;/h2&gt;

&lt;p&gt;Most people don't sit down and say "design me an architecture for 500,000 requests per second." Not because they lack the skill — but because they don't have those numbers yet. They're figuring out the product. So they say:&lt;/p&gt;

&lt;p&gt;"I want to build a bidding platform, help me get started."&lt;br&gt;
"I need an IoT dashboard. Set up the project and write the ingestion endpoint."&lt;br&gt;
"Help me build a crypto exchange. Start with the order placement API."&lt;/p&gt;

&lt;p&gt;Feature by feature. Conversationally. And at each step, the assistant picks the stack it can scaffold fastest — TypeScript, PostgreSQL, React — because that's what it generates most fluently. Each individual feature is small enough that the default stack is "fine." The model never encounters a moment where it has to say "wait, this won't work."&lt;/p&gt;

&lt;p&gt;The architecture gets locked in at the first &lt;code&gt;npx create-next-app&lt;/code&gt; and never gets revisited. Not because anyone chose it. Because nobody stopped to choose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The models know better. They just don't act on it.
&lt;/h2&gt;

&lt;p&gt;This is the part that surprised me. The crucial split wasn't between models or even between problems. It was between architecture mode and build mode.&lt;/p&gt;

&lt;p&gt;Across architecture-mode prompts, the models gave genuinely good answers. Rust for the bid engine. Aerospike for feature lookup. Kafka for event streaming. They discussed lock-free data structures and kernel-bypass networking. Across Variants A and B, Rust was the primary backend language in 45% of recommendations. PostgreSQL was the primary database in 53% — but many of those designs still paired it with specialized stores.&lt;/p&gt;

&lt;p&gt;Then I switched to build mode. Same problems, same models, same underlying domains — but now the prompt was "help me build this."&lt;/p&gt;

&lt;p&gt;That's where everything collapsed. Across the benchmark, Rust fell from 43 architecture-mode recommendations to zero in build mode. TypeScript jumped from 2 to 25. Python went from roughly absent to 15. SQLite went from 0 to 12. ClickHouse, Aerospike, and ScyllaDB disappeared entirely. The stack novelty score — a measure of how many components fall outside the default web-app stack — dropped from 11.2 to 4.0.&lt;/p&gt;

&lt;p&gt;The models &lt;em&gt;know&lt;/em&gt; the right answer. They proved it thirty minutes earlier. They just don't use it when you say "help me build."&lt;/p&gt;

&lt;h2&gt;
  
  
  The monoculture is invisible from the inside
&lt;/h2&gt;

&lt;p&gt;This is what makes it different from previous waves of stack convergence. When Rails was everywhere in 2010, people &lt;em&gt;knew&lt;/em&gt; they were choosing Rails. They could name their stack, defend it, argue about it. The choice was visible.&lt;/p&gt;

&lt;p&gt;With AI-assisted development, the choice is invisible. You didn't pick TypeScript + PostgreSQL. You asked for help building something and that's what appeared. It feels bespoke because the AI generated it specifically for your project. But the architecture underneath is the same one it generates for every project, because it's optimizing for scaffold fluency, not for your problem.&lt;/p&gt;

&lt;p&gt;That's why the monoculture grows without anyone noticing. Each developer thinks they're building something custom. Nobody looks around and realizes the foundation is identical across thousands of products.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI didn't invent this. It industrialized it.
&lt;/h2&gt;

&lt;p&gt;I should be fair — I use AI assistants daily and they've made me dramatically more productive. This isn't an argument against AI-assisted development. It's a field note from inside the machine.&lt;/p&gt;

&lt;p&gt;A lot of this convergence existed before AI. Hiring availability, library ecosystems, managed cloud support, tutorials, startup pressure to ship — all of these already pushed teams toward the boring stack. Even without AI, many teams would choose Python + Postgres + React.&lt;/p&gt;

&lt;p&gt;But before AI, that choice involved friction. You had to read docs, compare options, talk to other engineers, live with uncertainty. That friction was annoying, but it was &lt;em&gt;productive&lt;/em&gt;. It forced architecture to be a thinking exercise.&lt;/p&gt;

&lt;p&gt;AI removes the friction. And what gets lost isn't speed — it's the argument. The internal debate that used to happen before a stack was chosen now never occurs, because a complete, working scaffold appears before you have time to doubt.&lt;/p&gt;

&lt;p&gt;The boring stack is often the right stack. The danger is when it's chosen by the scaffold instead of by the engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  I'm not immune
&lt;/h2&gt;

&lt;p&gt;I saw this in my own work while building &lt;a href="https://github.com/edictum-ai" rel="noopener noreferrer"&gt;Edictum&lt;/a&gt;, my open-source runtime governance framework for AI agents. I'm a platform engineer. I run Kafka clusters and Kubernetes infrastructure professionally. I hold a Kubestronaut certification. I know what production event-driven systems look like.&lt;/p&gt;

&lt;p&gt;And still, when I sat down to build with AI assistance, I shipped FastAPI + React + PostgreSQL. The exact default. Not because I chose it after evaluation, but because the AI made it effortless and I was optimizing for speed.&lt;/p&gt;

&lt;p&gt;A prospective client asked me how I was thinking about scale. And I started talking about Kafka, event ingestion pipelines, ClickHouse for analytics. That's what I do professionally. Then I looked at what I'd actually built and realized none of that knowledge was in the product. The AI had scaffolded a perfectly functional FastAPI monolith, and I'd let it.&lt;/p&gt;

&lt;p&gt;The model doesn't make you worse. It makes you generic. And you don't notice until someone asks a question that forces you to look at what you actually built versus what you know.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means
&lt;/h2&gt;

&lt;p&gt;Right now, thousands of products are being built on the same invisible foundation. The same databases for the wrong workloads. The same frameworks for problems they weren't designed for. The same patterns, the same dependencies, the same scaling ceilings — all chosen by AI assistants optimizing for fluency rather than fitness.&lt;/p&gt;

&lt;p&gt;When those products hit scale, the rewrites will come. And the fix is never incremental. You don't "add Kafka" to a Next.js app. You don't bolt ClickHouse onto Supabase. You don't retrofit event sourcing into a CRUD scaffold. You rewrite.&lt;/p&gt;

&lt;p&gt;The time saved by not having the architecture argument up front gets repaid with interest. And the interest rate is brutal, because by then you have users, data, integrations, and a team that learned the wrong patterns.&lt;/p&gt;

&lt;p&gt;The AI monoculture is real. It's growing. And most people building inside it have no idea they're there.&lt;/p&gt;

&lt;p&gt;The question every founder and tech lead should ask themselves:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Would I have built it this way if I had to justify every decision to a staff engineer?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the answer is no, you don't have an architecture. You have a default.&lt;/p&gt;




&lt;h2&gt;
  
  
  The data
&lt;/h2&gt;

&lt;p&gt;I tested GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6, Kimi K2.5 (Moonshot AI), GLM-5 (Zhipu AI), and MiniMax M2.5 across four problem briefs: real-time bidding platform (AdTech), IoT telemetry platform (Industrial IoT), government benefits portal (GovTech — the control, where a boring stack is actually correct), and cryptocurrency exchange (FinTech).&lt;/p&gt;

&lt;p&gt;Each problem was prompted three ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Variant A&lt;/strong&gt; — "Design the architecture." Open-ended.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variant B&lt;/strong&gt; — "Design the architecture for these hard requirements." With specific numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variant C&lt;/strong&gt; — "Help me build this. Set up the project, pick the stack, write the scaffolding." The vibe coding prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Six models × four briefs × three variants × two temperatures = 144 completions. Responses were parsed using Claude Sonnet 4.6 as a structured extraction layer, validated manually on a 10% sample.&lt;/p&gt;

&lt;h3&gt;
  
  
  The numbers
&lt;/h3&gt;

&lt;p&gt;Variants A and B (architecture mode, 96 responses):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL as primary DB: 53%&lt;/li&gt;
&lt;li&gt;Rust as primary language: 45%&lt;/li&gt;
&lt;li&gt;React correctly omitted: 64%&lt;/li&gt;
&lt;li&gt;Models chose Aerospike, ScyllaDB, TimescaleDB, ClickHouse for the right problems&lt;/li&gt;
&lt;li&gt;Mean stack novelty score: 11.2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Architecture mode (A+B) → Build mode (C):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rust: 43 → 0&lt;/li&gt;
&lt;li&gt;TypeScript: 2 → 25&lt;/li&gt;
&lt;li&gt;Python: 1 → 15&lt;/li&gt;
&lt;li&gt;SQLite: 0 → 12&lt;/li&gt;
&lt;li&gt;ClickHouse, Aerospike, ScyllaDB: all → 0&lt;/li&gt;
&lt;li&gt;Mean stack novelty score: 11.2 → 4.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vjonrwavvucm0rexdqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vjonrwavvucm0rexdqz.png" alt=" " width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Raw data, runner code, and extraction pipeline: &lt;a href="https://github.com/acartag7/ai-monoculture" rel="noopener noreferrer"&gt;github.com/acartag7/ai-monoculture&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>startup</category>
      <category>architecture</category>
      <category>programming</category>
    </item>
    <item>
      <title>My AI agent pushed directly to main. The system prompt said don't.</title>
      <dc:creator>Arnold Cartagena</dc:creator>
      <pubDate>Sun, 08 Feb 2026 15:52:41 +0000</pubDate>
      <link>https://forem.com/acartag7/my-ai-agent-pushed-directly-to-main-the-system-prompt-said-dont-1i0i</link>
      <guid>https://forem.com/acartag7/my-ai-agent-pushed-directly-to-main-the-system-prompt-said-dont-1i0i</guid>
      <description>&lt;p&gt;I was demoing my AI agent to colleagues. The agent had access to Git tooling, and my carefully crafted system prompt was clear: &lt;em&gt;create a branch, open a PR, never push directly to the repo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The agent pushed directly to main.&lt;/p&gt;

&lt;p&gt;I tried rewording the prompt. I tried being more explicit. I tried few-shot examples. The agent pushed to main again — because when an LLM decides something is "the fastest way to help," your prompt is a suggestion it can override.&lt;/p&gt;

&lt;p&gt;I had no way to block that tool call. No mechanism between "the LLM decided to do this" and "the tool executed." I needed something at that boundary — deterministic, not probabilistic. Something the LLM couldn't talk its way past.&lt;/p&gt;

&lt;p&gt;My first attempt was hardcoded Python — regex patterns matching against bash command strings, wired into the SDK's hook system. It worked, but the patterns were buried in code, untestable without spinning up the agent, and impossible for anyone outside my team to review or modify.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/acartag7/edictum" rel="noopener noreferrer"&gt;Edictum&lt;/a&gt; to turn that approach into declarative, testable, framework-agnostic contracts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Edictum sits between your agent and its tools. When your agent decides to call a tool, Edictum evaluates the call against YAML contracts &lt;strong&gt;before it executes&lt;/strong&gt;. If the contract says deny, the call never happens. The LLM never gets a chance to argue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;edictum/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ContractBundle&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;git-safety-policy&lt;/span&gt;
&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enforce&lt;/span&gt;
&lt;span class="na"&gt;contracts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block-push-to-main&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;args.command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;matches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;git\s+push\s+.*main'&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;then&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;push&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;blocked.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;branch."&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block-force-push&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;args.command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;matches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;git\s+push\s+.*(-f|--force)'&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;then&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Force&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;push&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;allowed."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent's tool was &lt;code&gt;Bash&lt;/code&gt;. The "args" were a raw command string. The contract matches against that string — same patterns you'd write in a firewall rule. The denial is deterministic. No probability. No LLM judgment call. The contract either passes or it doesn't.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The YAML above is a complete, loadable contract bundle. Edictum uses a Kubernetes-style format with &lt;code&gt;apiVersion&lt;/code&gt;, &lt;code&gt;kind&lt;/code&gt;, and &lt;code&gt;metadata&lt;/code&gt; headers. Every contract needs a unique &lt;code&gt;id&lt;/code&gt;. See the &lt;a href="https://docs.edictum.dev/contracts/yaml-reference/" rel="noopener noreferrer"&gt;YAML reference&lt;/a&gt; for the full schema.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What it is NOT
&lt;/h2&gt;

&lt;p&gt;The AI safety landscape is confusing right now, so I want to be direct:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not prompt guardrails.&lt;/strong&gt; Edictum doesn't scan prompts for jailbreaks or filter LLM outputs for toxicity. Tools like NeMo Guardrails, Lakera Guard, and Guardrails AI do that well. Edictum operates at a different layer — it governs what the agent &lt;strong&gt;does&lt;/strong&gt;, not what it &lt;strong&gt;says&lt;/strong&gt;. That said, an interesting side effect: during testing, jailbreak prompts that convinced the LLM to attempt dangerous tool calls were still denied by contracts. The contracts don't care what the LLM thinks — they evaluate the tool call itself. Not our focus, but the screwdriver works as a hammer sometimes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not a framework.&lt;/strong&gt; You still need LangChain, OpenAI Agents SDK, CrewAI, or whatever you're building with. Edictum plugs into your existing framework through thin adapters (~200 lines each inside the library). Your integration code is typically 3-5 lines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not an LLM-in-the-loop.&lt;/strong&gt; Every evaluation is pure Python. No API calls. No inference. The pipeline runs in ~55μs per tool call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and after: LangChain
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Without Edictum&lt;/strong&gt; — your agent reads whatever it wants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read a file from the filesystem.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# nothing stops path="/app/.env"
&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the .env file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]})&lt;/span&gt;
&lt;span class="c1"&gt;# Agent reads .env, returns your API keys to the LLM context
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With Edictum&lt;/strong&gt; — dangerous calls are denied before execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;edictum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Edictum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Principal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;edictum.adapters.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LangChainAdapter&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read a file from the filesystem.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Edictum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_yaml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contracts.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LangChainAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Principal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;wrapper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_tool_wrapper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;tool_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;wrap_tool_call&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the .env file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]})&lt;/span&gt;
&lt;span class="c1"&gt;# ✗ DENIED read_file path=/app/.env [block-sensitive-reads]
# Agent receives denial message, adapts, asks user what file they need
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt; a structured &lt;code&gt;AuditEvent&lt;/code&gt; for every tool call — who tried what, when, which contract fired, what the verdict was. Your agent's tool usage becomes an auditable trail, not a black box.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it actually works
&lt;/h2&gt;

&lt;p&gt;The pipeline evaluates tool calls in a fixed order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Attempt limits&lt;/strong&gt; — has this tool been called too many times?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before-hooks&lt;/strong&gt; — custom Python callbacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preconditions&lt;/strong&gt; — YAML contracts checked against tool name + args + principal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session contracts&lt;/strong&gt; — cross-call limits (e.g. max 50 tool calls per conversation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution limits&lt;/strong&gt; — per-tool execution caps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt; — the actual tool call happens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postconditions&lt;/strong&gt; — validate the output (did it contain an SSN?)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit event&lt;/strong&gt; — structured record of everything that happened&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every step is deterministic. The LLM is not consulted.&lt;/p&gt;

&lt;h2&gt;
  
  
  The piece that matters for production: principals
&lt;/h2&gt;

&lt;p&gt;Contracts can reference who's making the request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;edictum/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ContractBundle&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pharma-clinical-agent&lt;/span&gt;
&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enforce&lt;/span&gt;
&lt;span class="na"&gt;contracts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restrict-patient-data&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;query_patients&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;not&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;principal.role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;pharmacovigilance&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="na"&gt;then&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Role&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{principal.role}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cannot&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;access&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patient&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;records"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your application creates the principal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;edictum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Principal&lt;/span&gt;

&lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Principal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;oncology&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;ticket_ref&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JIRA-456&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Today the library trusts what your application passes. There's an open question about whether identity verification should live in the library or stay in the application layer — both approaches have tradeoffs. For now, the design gives you principal-aware policies without prescribing how you verify identity. The roadmap includes server-side JWT/OIDC verification for teams that want the trust boundary inside Edictum rather than outside it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observe mode
&lt;/h2&gt;

&lt;p&gt;This comes from my background in networking. When you configure new firewall rules, you don't apply them blindly to production. You put them in monitor mode, watch the traffic, verify the rules match what you expect, then flip to enforce.&lt;/p&gt;

&lt;p&gt;Same idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Edictum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_yaml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contracts.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;observe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In observe mode, violations are logged but calls proceed normally. You see what &lt;em&gt;would&lt;/em&gt; be denied without breaking anything. Run for a week, review the audit trail, fix false positives, flip to enforce. Zero-risk policy deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI: contracts as testable artifacts
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Validate YAML syntax and schema&lt;/span&gt;
edictum validate contracts.yaml

&lt;span class="c"&gt;# Run precondition test cases&lt;/span&gt;
edictum &lt;span class="nb"&gt;test &lt;/span&gt;contracts.yaml &lt;span class="nt"&gt;--cases&lt;/span&gt; tests.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Put &lt;code&gt;edictum test&lt;/code&gt; in CI. Your security policies become versioned, tested, reviewable artifacts — not buried in prompt templates.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;code&gt;edictum test&lt;/code&gt; evaluates preconditions against your test cases. For full end-to-end testing including postconditions and session limits, use the Python API directly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What it doesn't do (yet)
&lt;/h2&gt;

&lt;p&gt;I want to be honest about where the edges are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-process only.&lt;/strong&gt; Session counters live in-memory. If you have multiple agent instances, each tracks its own counters independently. A central policy server is planned but not built.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No PII detection built-in.&lt;/strong&gt; The protocol is defined (v0.6.0) — you can plug in your own detector. Built-in regex and Presidio-based detectors are coming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No production sinks beyond file.&lt;/strong&gt; Audit events go to stdout or &lt;code&gt;.jsonl&lt;/code&gt; files. Webhook, Splunk, and Datadog sinks are planned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry is early.&lt;/strong&gt; Span instrumentation exists but isn't battle-tested in production yet. It's opt-in and no-op if the OTel SDK isn't installed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No hot-reload.&lt;/strong&gt; Contracts are loaded at startup. Changing them requires a restart.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://docs.edictum.dev/roadmap/" rel="noopener noreferrer"&gt;roadmap&lt;/a&gt; shows what's planned and when.&lt;/p&gt;

&lt;h2&gt;
  
  
  The landscape — where Edictum fits
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NeMo Guardrails&lt;/td&gt;
&lt;td&gt;Programmable dialog flows, content safety, jailbreak detection&lt;/td&gt;
&lt;td&gt;Prompt/response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guardrails AI&lt;/td&gt;
&lt;td&gt;Output validation, schema enforcement, hallucination detection&lt;/td&gt;
&lt;td&gt;LLM output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lakera Guard&lt;/td&gt;
&lt;td&gt;Prompt injection detection, PII scanning&lt;/td&gt;
&lt;td&gt;Input/output proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LlamaGuard&lt;/td&gt;
&lt;td&gt;Safety classification of conversations&lt;/td&gt;
&lt;td&gt;Content classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edictum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Contract enforcement on tool calls — preconditions, postconditions, session limits, audit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Tool execution boundary&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These tools are complementary, not competing. You can run Lakera on prompts AND Edictum on tool calls. Different layers, different threats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;The governance pipeline adds ~55μs per tool call. That's measured, not estimated. For context, a typical LLM API call takes 500ms-3s. Edictum's overhead is invisible.&lt;/p&gt;

&lt;p&gt;Zero runtime dependencies in core. YAML parsing, adapters, CLI, and OTel are optional extras — install only what you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;edictum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/acartag7/edictum" rel="noopener noreferrer"&gt;github.com/acartag7/edictum&lt;/a&gt;&lt;br&gt;
Docs: &lt;a href="https://docs.edictum.dev" rel="noopener noreferrer"&gt;docs.edictum.dev&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If you're deploying agents that touch production systems — files, databases, APIs, infrastructure — I'd genuinely like to hear how you're handling the gap between "the LLM decided to call a tool" and "the tool executed." That's the layer Edictum was built for.
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Edictum is MIT licensed. Built during recovery from liver surgery because apparently I can't sit still. Feedback, issues, and PRs welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Why Your LLM Returns “Sure! Here’s the JSON” and How to Fix It</title>
      <dc:creator>Arnold Cartagena</dc:creator>
      <pubDate>Tue, 03 Feb 2026 15:39:26 +0000</pubDate>
      <link>https://forem.com/acartag7/why-your-llm-returns-sure-heres-the-json-and-how-to-fix-it-2b1g</link>
      <guid>https://forem.com/acartag7/why-your-llm-returns-sure-heres-the-json-and-how-to-fix-it-2b1g</guid>
      <description>&lt;p&gt;You ask for JSON. The LLM returns:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sure! Here's the JSON you requested:&lt;/p&gt;


&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Let me know if you need anything else!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your parser crashes. Your RAG/Agentic pipeline fails (or worse: gets swallowed behind a generic infinite retry handler). You add more prompt engineering. It works 90% of the time. The other 10%? You're debugging infinitely wondering which of your 12 nodes broke. You didn't want those "Sure! Here's the JSON you requested" or the "Let me know if you need anything else!", I just wanted json. &lt;/p&gt;

&lt;p&gt;I had this a lot when trying to get a consistent output of LLMs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I thought this was just me.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Most teams shipping/testing LLM features run into some version of this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You ask for JSON, you get &lt;code&gt;"Sure! Here's the JSON you requested:"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The JSON has trailing commas, single quotes, or gets truncated&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;json.loads()&lt;/code&gt; fails with "line 1 column 47" — low-context at best&lt;/li&gt;
&lt;li&gt;You retry, but the LLM makes the same mistake&lt;/li&gt;
&lt;li&gt;You add prompt engineering. It works 90% of the time. The other 10%...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This prompt engineering part is really a pain to do, multiple versions of the prompt when the problem can actually be solved in other ways.&lt;/p&gt;

&lt;p&gt;Search any LLM framework's issues for "JSON" or "ValidationError". The problem shows up across models and frameworks. The solutions are scattered across docs, GitHub issues, and custom workarounds.&lt;/p&gt;

&lt;p&gt;There are really two failures here: &lt;strong&gt;parsing&lt;/strong&gt; (turning text into JSON) and &lt;strong&gt;validation&lt;/strong&gt; (ensuring the JSON matches what your pipeline expects). handoff-guard handles both, plus retries with feedback.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why LLMs Do This
&lt;/h2&gt;

&lt;p&gt;LLMs are trained to be helpful. When you ask for JSON, they want to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Acknowledge your request ("Sure!")&lt;/li&gt;
&lt;li&gt;Explain what they're giving you ("Here's the JSON:")&lt;/li&gt;
&lt;li&gt;Format it nicely (markdown code blocks)&lt;/li&gt;
&lt;li&gt;Offer follow-up help ("Let me know if...")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is great for chat. It's terrible for parsing.&lt;/p&gt;

&lt;p&gt;And it gets worse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Truncation&lt;/strong&gt;: Hit the token limit? Your JSON ends mid-string: &lt;code&gt;{"draft": "This is a long article about...&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Malformed syntax&lt;/strong&gt;: Trailing commas, single quotes, unquoted keys. All common LLM outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nested code blocks&lt;/strong&gt;: JSON containing

``` characters breaks regex-based parsers&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Approaches (and Their Tradeoffs)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Just use JSON mode"&lt;/strong&gt; — JSON/structured-output modes help when available, but they guarantee &lt;em&gt;syntax&lt;/em&gt;, not &lt;em&gt;schema&lt;/em&gt;. You still get validation errors, truncation, and no framework-level context like "which node failed."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use OutputFixingParser"&lt;/strong&gt; — LangChain's output-fixing pattern repairs by calling the LLM again—adding latency and cost for every error. Its recommended usage has also shifted across LangChain versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use Instructor"&lt;/strong&gt; — Powerful for structured generation across many providers. When it fixes errors, it usually does so by re-prompting the LLM. If you want fast, local repair without burning more tokens, you need a post-processor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use Outlines"&lt;/strong&gt; — Great for constrained decoding, but requires control over the inference server (e.g., vLLM). It doesn't help if you're calling a closed API like OpenAI or Anthropic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Add more prompt engineering"&lt;/strong&gt; — You're playing whack-a-mole. Fix one edge case, another appears.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built Instead
&lt;/h2&gt;

&lt;p&gt;I needed something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with &lt;strong&gt;raw text output from any provider&lt;/strong&gt; (post-hoc, not constrained generation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identifies which node failed&lt;/strong&gt; (not just "validation error")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retries with feedback&lt;/strong&gt; (tells the LLM what went wrong)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repairs common syntax issues locally&lt;/strong&gt; (without calling the LLM again)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stays lightweight&lt;/strong&gt; (no embeddings, no ML, just parsing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/acartag7/handoff-guard" rel="noopener noreferrer"&gt;handoff-guard&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
def writer_agent(state: dict) -&amp;gt; dict:
    response = call_llm("Return JSON with: draft, word_count, tone")

    # Hope it's valid JSON
    try:
        data = json.loads(response)
    except json.JSONDecodeError:
        # Which node? What failed? Can the agent retry?
        raise

    # Hope it matches the schema
    try:
        validated = WriterOutput(**data)
    except ValidationError:
        # "1 validation error for WriterOutput" — thanks for nothing
        raise

    return data


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  After
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
from handoff import guard, retry, parse_json  # PyPI: handoff-guard
from pydantic import BaseModel, Field

class WriterOutput(BaseModel):
    draft: str = Field(min_length=100)
    word_count: int = Field(ge=50)
    tone: str

@guard(output=WriterOutput, node_name="writer", max_attempts=3)
def writer_agent(state: dict) -&amp;gt; dict:
    prompt = "Return JSON with: draft, word_count, tone"

    if retry.is_retry:
        prompt += f"\n\nPrevious attempt failed:\n{retry.feedback()}"

    response = call_llm(prompt)
    return parse_json(response)  # Strips wrappers, repairs syntax


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If it fails after 3 attempts:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

HandoffViolation in 'writer':
  Contract: output
  Field: draft
  Expected: String should have at least 100 characters
  Received: 'Too short...' (str)
  Suggestion: Increase the length of 'draft'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For logs/telemetry, access &lt;code&gt;e.total_attempts&lt;/code&gt;, &lt;code&gt;e.history&lt;/code&gt;, or &lt;code&gt;e.to_dict()&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What &lt;code&gt;parse_json&lt;/code&gt; Actually Does
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
from handoff import parse_json

# Strips conversational wrappers
obj = parse_json('Sure! Here\'s the JSON:\n{"key": "value"}\nLet me know!')
# -&amp;gt; Python dict/list (parsed JSON), not a JSON string

# Handles common syntax issues (via json-repair)
parse_json('{"a": 1,}')        # trailing comma → {"a": 1}
parse_json("{'a': 1}")         # single quotes → {"a": 1}
parse_json('{a: 1}')           # unquoted keys → {"a": 1}
parse_json('{"a": 1 // comment}')  # JS comments → {"a": 1}

# Detects truncation (v0.2.1)
result = parse_json('{"draft": "long text...', detailed=True)
# -&amp;gt; ParseResult with .data (dict), .truncated (bool), .repaired (bool)
result.truncated  # True — best-effort signal (unmatched braces detected)
result.repaired   # True — json-repair path was used successfully


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;No LLM calls. No embeddings. Deterministic parsing with best-effort repair. I haven't published benchmarks; this was built from real failure modes in my own graphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Instructor/Outlines?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Instructor&lt;/th&gt;
&lt;th&gt;Outlines&lt;/th&gt;
&lt;th&gt;handoff-guard&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Approach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generation-time validation&lt;/td&gt;
&lt;td&gt;Constrained generation&lt;/td&gt;
&lt;td&gt;Post-hoc validation &amp;amp; repair&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Works with&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI, Anthropic, etc.&lt;/td&gt;
&lt;td&gt;vLLM, Transformers&lt;/td&gt;
&lt;td&gt;Any string output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph compatible&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (manual)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (adapter: &lt;code&gt;guarded_node&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identifies failed node&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retries with feedback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repairs malformed JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (via re-prompt)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Yes (local, no tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pydantic + provider SDKs&lt;/td&gt;
&lt;td&gt;Transformers/vLLM stack&lt;/td&gt;
&lt;td&gt;Pydantic + json-repair&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Instructor and Outlines are excellent tools. The difference is &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;how&lt;/em&gt; they work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instructor&lt;/strong&gt; validates at generation time and fixes errors by re-prompting—effective but costs tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outlines&lt;/strong&gt; constrains generation at the model level—powerful but requires inference server control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;handoff-guard&lt;/strong&gt; validates &lt;em&gt;after&lt;/em&gt; the LLM responds and repairs locally—no extra tokens, works with any provider&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problems This Actually Solves
&lt;/h2&gt;

&lt;p&gt;handoff-guard doesn't fix framework bugs. It helps when &lt;strong&gt;you control the code&lt;/strong&gt; that receives LLM output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;How handoff-guard helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM wraps JSON in conversation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Sure! Here's the JSON: {...}"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parse_json()&lt;/code&gt; strips wrappers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Malformed JSON syntax&lt;/td&gt;
&lt;td&gt;Trailing commas, single quotes, unquoted keys&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parse_json()&lt;/code&gt; repairs common issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Truncated output at token limit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{"draft": "long text...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;parse_json(detailed=True)&lt;/code&gt; detects truncation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"ValidationError" with no context&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1 validation error for State&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@guard(node_name="writer")&lt;/code&gt; tells you which node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No retry on validation failure&lt;/td&gt;
&lt;td&gt;Agent fails once, stays failed&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@guard(max_attempts=3)&lt;/code&gt; retries automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM doesn't know why it failed&lt;/td&gt;
&lt;td&gt;Retry happens but same error repeats&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;retry.feedback()&lt;/code&gt; tells the LLM what went wrong&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Limits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What this won't magically fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Missing or hallucinated data&lt;/strong&gt; — If the model omits required fields or invents values, deterministic repair can't invent correct data. Retries are still needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguous repairs&lt;/strong&gt; — "Repair" is sometimes a best-effort guess (e.g., unquoted keys, stray punctuation). Always validate the result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Severe truncation&lt;/strong&gt; — You can detect it, but you can't recover missing content without another generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial or multi-JSON outputs&lt;/strong&gt; — &lt;code&gt;parse_json&lt;/code&gt; extracts the first JSON object/array boundary it finds. Complex tool traces or multiple embedded objects may need custom handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security note:&lt;/strong&gt; If you're parsing untrusted model output, treat "repaired JSON" as untrusted input. Validate types and ranges.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
pip install handoff-guard


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The package is &lt;code&gt;handoff-guard&lt;/code&gt;, the import namespace is &lt;code&gt;handoff&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
from handoff import guard, retry, parse_json


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No config files. No API keys. No Docker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/acartag7/handoff-guard" rel="noopener noreferrer"&gt;github.com/acartag7/handoff-guard&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/handoff-guard/" rel="noopener noreferrer"&gt;pypi.org/project/handoff-guard&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The library does what it set out to do. I'm not planning major features just bug fixes and edge cases as users report them, as it actually works for my current need.&lt;/p&gt;

&lt;p&gt;If you hit something it doesn't handle, &lt;a href="https://github.com/acartag7/handoff-guard/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built because "ValidationError: 1 validation error" tells you nothing useful.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
