<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Scotty G</title>
    <description>The latest articles on Forem by Scotty G (@sg-prime).</description>
    <link>https://forem.com/sg-prime</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883499%2Fbe47a7d6-f913-48b0-a4bc-f6eeec73df8b.png</url>
      <title>Forem: Scotty G</title>
      <link>https://forem.com/sg-prime</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sg-prime"/>
    <language>en</language>
    <item>
      <title>Testing AI Agents Like Code: the `oa test` Harness</title>
      <dc:creator>Scotty G</dc:creator>
      <pubDate>Thu, 23 Apr 2026 00:39:21 +0000</pubDate>
      <link>https://forem.com/sg-prime/testing-ai-agents-like-code-the-oa-test-harness-oh</link>
      <guid>https://forem.com/sg-prime/testing-ai-agents-like-code-the-oa-test-harness-oh</guid>
      <description>&lt;p&gt;You wouldn't ship code without tests. But most AI agents ship with nothing — a handful of manual prompts in a notebook, a screenshot of "it worked once," and a prayer that production inputs don't look too different from the test ones.&lt;/p&gt;

&lt;p&gt;OAS 1.4 ships &lt;code&gt;oa test&lt;/code&gt; a test harness that runs eval cases against real models, asserts on output shape and content, and emits CI-friendly JSON. Your agents get tested like code, because they &lt;em&gt;are&lt;/em&gt; code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a test file looks like
&lt;/h2&gt;

&lt;p&gt;Tests live alongside the spec. One YAML file per agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .agents/summariser.test.yaml&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./summariser.yaml&lt;/span&gt;

&lt;span class="na"&gt;cases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarises short documents&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;document&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sky&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;blue.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;grass&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;green.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Water&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;wet."&lt;/span&gt;
    &lt;span class="na"&gt;expect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;output.summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;string&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;min_length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;handles empty facts gracefully&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;document&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="na"&gt;expect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;output.summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;contains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;content"&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;smoke test only&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;document&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;..."&lt;/span&gt;
    &lt;span class="c1"&gt;# no expect block — passes if the model returns anything valid&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three cases, one file. Each case targets a task in the spec, provides the input, and optionally asserts on the output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The assertion vocabulary
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;oa test&lt;/code&gt; supports a small, practical set of assertions, enough to catch real bugs without turning tests into a DSL.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Assertion&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Checks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contains&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ contains: "welcome" }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Substring match (case-insensitive by default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;equals&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ equals: "greeting" }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exact value equality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ type: array }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Value type: &lt;code&gt;string&lt;/code&gt;, &lt;code&gt;number&lt;/code&gt;, &lt;code&gt;boolean&lt;/code&gt;, &lt;code&gt;object&lt;/code&gt;, &lt;code&gt;array&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;min_length&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ min_length: 1 }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Length for strings or arrays&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_length&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ max_length: 500 }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Upper bound for strings or arrays&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can combine them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;expect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;output.items&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;array&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;min_length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;max_length&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;10&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
  &lt;span class="s"&gt;output.items[0].id&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;string&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;output.summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;contains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sky"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;case_sensitive&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;false&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paths support dotted access and array indexing (&lt;code&gt;output.items[0].id&lt;/code&gt;). The parser is deliberately simple, if you need richer assertions, drop to a post-processing step in CI rather than extending the harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the tests
&lt;/h2&gt;

&lt;p&gt;From the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oa &lt;span class="nb"&gt;test&lt;/span&gt; .agents/summariser.test.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get human-readable output — green ticks, red crosses, which case failed and why.&lt;/p&gt;

&lt;p&gt;For CI, flip to JSON mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oa &lt;span class="nb"&gt;test&lt;/span&gt; .agents/summariser.test.yaml &lt;span class="nt"&gt;--quiet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"spec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".agents/summariser.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cases"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"summarises short documents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;842&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"handles empty facts gracefully"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"output.summary: expected to contain 'no content', got 'The document is empty'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"smoke test only"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;654&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pipe this into whatever CI system you use. The exit code is non-zero on any failure, so &lt;code&gt;oa test&lt;/code&gt; plays nicely with standard test-runner conventions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing in CI
&lt;/h2&gt;

&lt;p&gt;Drop it into a GitHub Actions workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/test-agents.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test agents&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.11"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pipx install open-agent-spec&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run agent tests&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.OPENAI_API_KEY }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;for test in .agents/*.test.yaml; do&lt;/span&gt;
            &lt;span class="s"&gt;oa test "$test" --quiet&lt;/span&gt;
          &lt;span class="s"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents now have the same test discipline as the rest of your codebase. Break a prompt? The test case catches it before merge. Swap models? Run the suite and see what drifted.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually test
&lt;/h2&gt;

&lt;p&gt;Model outputs are non-deterministic, so your tests need to assert on &lt;strong&gt;shape and invariants&lt;/strong&gt;, not exact strings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do test:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Output schema conformance&lt;/strong&gt; required fields present, types correct&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structural invariants&lt;/strong&gt; "the summary is always under 500 chars," "the category is always one of these enum values"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refusal handling&lt;/strong&gt; empty or adversarial inputs don't crash the pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool interaction&lt;/strong&gt; tool-using agents produce the expected tool calls for known inputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delegated spec integration&lt;/strong&gt; a spec pulling &lt;code&gt;oa://prime-vector/summariser&lt;/code&gt; still works after the registry updates&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Don't test:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exact phrasing&lt;/strong&gt; — "the response should be 'Hello, Alice!'" — brittle and wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative output quality&lt;/strong&gt; — that's a human eval problem, not a test-suite problem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token counts or latency&lt;/strong&gt; — monitor these in production, don't gate PRs on them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test invariants, not novelty. That's where agent tests earn their keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;Agents-as-code only works if the agents are actually &lt;em&gt;code-like&lt;/em&gt;. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Version-controlled&lt;/strong&gt; — ✅ YAML in your repo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviewable&lt;/strong&gt; — ✅ prompts and schemas in a PR diff&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reusable&lt;/strong&gt; — ✅ spec delegation and the &lt;a href="//./04-composable-specs-registry.md"&gt;OAS registry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testable&lt;/strong&gt; — ✅ &lt;code&gt;oa test&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;oa test&lt;/code&gt; was the last piece missing. With it, agents get the same discipline as any other component of your system: change them, test them, merge them, deploy them.&lt;/p&gt;

&lt;p&gt;Define what your agents do. Let the runtime be someone else's problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;open-agent-spec

&lt;span class="c"&gt;# Add a test file next to your spec&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .agents/example.test.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
spec: ./example.yaml
cases:
  - name: greets by name
    task: greet
    input: { name: "CI" }
    expect:
      output.response: { contains: "CI" }
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Run it&lt;/span&gt;
oa &lt;span class="nb"&gt;test&lt;/span&gt; .agents/example.test.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command. One YAML file. Your agents now have a test suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/prime-vector/open-agent-spec/blob/main/spec/open-agent-spec-1.4.md" rel="noopener noreferrer"&gt;Formal specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/prime-vector/open-agent-spec/blob/main/docs/REFERENCE.md" rel="noopener noreferrer"&gt;Reference docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/prime-vector/open-agent-spec" rel="noopener noreferrer"&gt;GitHub: prime-vector/open-agent-spec&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Also in this series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="//./01-multi-agent-pipelines-yaml.md"&gt;The Sidecar Agent: Add AI to Any Project Without a Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./02-zero-sdk-agents.md"&gt;Why Your AI Agent Sidecar Shouldn't Have SDK Dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./03-formal-spec-conformance.md"&gt;We Published a Formal Spec for Our Agent Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./04-composable-specs-registry.md"&gt;Composable Agent Specs: Spec Delegation and the OAS Registry&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Open Agent Spec is MIT-licensed and maintained by &lt;a href="https://www.primevector.com.au" rel="noopener noreferrer"&gt;Prime Vector&lt;/a&gt;. If you're running agents in CI, we'd love to hear what broke — issues welcome on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
      <category>testing</category>
    </item>
    <item>
      <title>Composable Agent Specs: Spec Delegation and the OAS Registry</title>
      <dc:creator>Scotty G</dc:creator>
      <pubDate>Mon, 20 Apr 2026 00:31:19 +0000</pubDate>
      <link>https://forem.com/sg-prime/composable-agent-specs-spec-delegation-and-the-oas-registry-8h2</link>
      <guid>https://forem.com/sg-prime/composable-agent-specs-spec-delegation-and-the-oas-registry-8h2</guid>
      <description>&lt;p&gt;Most agent frameworks solve reuse the way libraries do: write a class, import it, hope the abstractions line up. That works inside one codebase. Between teams or across organisations? It breaks down fast.&lt;/p&gt;

&lt;p&gt;Open Agent Spec 1.4 takes a different approach.One agent spec can &lt;strong&gt;delegate&lt;/strong&gt; a task to another spec, loaded from a local path, a URL, or a public registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/summariser@1.0.0&lt;/span&gt;   &lt;span class="c1"&gt;# version-pinned, from registry&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line. Version-pinned. No copy-paste. Your pipeline gets a battle-tested summariser without importing anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why spec composition matters
&lt;/h2&gt;

&lt;p&gt;An agent that fits in one YAML file is easy. But as soon as you build a second agent that needs the same summarisation step, or the same sentiment classifier, or the same document-extraction task — you start copy-pasting.&lt;/p&gt;

&lt;p&gt;The usual fix is to wrap the shared logic in a Python function. But now the &lt;em&gt;agent definition&lt;/em&gt; is split across YAML and Python, the function drifts from the spec over time, and you're back to the problem OAS was meant to solve: agent logic spread across framework abstractions you can't review in a PR diff.&lt;/p&gt;

&lt;p&gt;Spec composition keeps the contract in YAML.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three ways to delegate
&lt;/h2&gt;

&lt;p&gt;A delegated task declares &lt;code&gt;spec:&lt;/code&gt; + &lt;code&gt;task:&lt;/code&gt; instead of its own prompts and output schema. The runtime loads the referenced spec, runs the target task, and surfaces the result transparently.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Local file
&lt;/h3&gt;

&lt;p&gt;For intra-repo reuse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./specs/summariser.yaml&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The path resolves relative to the calling spec's directory. Useful when you have a shared specs directory in a monorepo.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. HTTP/HTTPS URL
&lt;/h3&gt;

&lt;p&gt;For cross-repo reuse without a registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://raw.githubusercontent.com/your-org/agents/main/summariser.yaml&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any spec reachable over HTTP works. Good for internal GitHub-hosted specs behind SSO.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Registry reference (&lt;code&gt;oa://&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;For public, versioned reuse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/summariser@1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;oa://namespace/name&lt;/code&gt; expands to &lt;code&gt;https://openagentspec.dev/registry/namespace/name/latest/spec.yaml&lt;/code&gt;. With &lt;code&gt;@version&lt;/code&gt;, it pins to an exact version. Drop the version for "latest" semantics — useful in development, risky in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What lives on the registry
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://openagentspec.dev/registry" rel="noopener noreferrer"&gt;OAS registry&lt;/a&gt; is open for community contributions. A handful of Prime Vector-authored specs are already published as a baseline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reference&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/summariser&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Document summarisation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/classifier&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-class document classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/sentiment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sentiment analysis with confidence scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/keyword-extractor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Key phrase extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/code-reviewer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code review task for PR diffs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;oa://prime-vector/memory-retriever&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Retrieve relevant context from a memory store&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can compose these into your own pipelines without writing the prompts yourself. Each is a single YAML file with a typed output schema — so when you call it, you know exactly what you're going to get back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzih04gvpe9rcrxihc8h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzih04gvpe9rcrxihc8h.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Composing delegated tasks with &lt;code&gt;depends_on&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Delegation stacks with &lt;code&gt;depends_on&lt;/code&gt;. You can chain multiple delegated tasks together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;open_agent_spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.4.0"&lt;/span&gt;

&lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;document-processor&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;summarise,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;document"&lt;/span&gt;

&lt;span class="na"&gt;intelligence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o&lt;/span&gt;

&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;extract_keywords&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/keyword-extractor@1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;extract&lt;/span&gt;

  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/summariser@1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;extract_keywords&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;classify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/classifier@1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;classify&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three delegated tasks. Three different authors possible. One pipeline. The runtime handles the load → run → merge flow, and the result envelope tells you exactly which spec each task delegated to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"classify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"delegated_to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"oa://prime-vector/classifier@1.0.0#classify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"technical-documentation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Safety: cycle detection
&lt;/h2&gt;

&lt;p&gt;One risk with delegation: A delegates to B which delegates to A. An infinite loop waiting to happen.&lt;/p&gt;

&lt;p&gt;The runtime detects this before any model call is made and raises &lt;code&gt;DELEGATION_CYCLE_ERROR&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Circular spec delegation detected: './summariser.yaml' is already in the delegation stack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DELEGATION_CYCLE_ERROR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"delegation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"summarise"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;a href="https://github.com/prime-vector/open-agent-spec/blob/main/spec/open-agent-spec-1.4.md" rel="noopener noreferrer"&gt;specified in the formal OAS standard&lt;/a&gt; as a MUST requirement — any conforming runtime has to detect cycles before spending tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Version pinning: why you should care
&lt;/h2&gt;

&lt;p&gt;The registry supports both floating (&lt;code&gt;oa://.../summariser&lt;/code&gt;) and pinned (&lt;code&gt;oa://.../summariser@1.0.0&lt;/code&gt;) references.&lt;/p&gt;

&lt;p&gt;In production, pin your versions. Same reason you pin npm or PyPI packages. A spec author can update prompts, tighten schemas, or change defaults in ways that look like minor improvements but subtly alter your pipeline's output.&lt;/p&gt;

&lt;p&gt;Semantic Versioning applies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patch bumps&lt;/strong&gt; (&lt;code&gt;1.0.0&lt;/code&gt; → &lt;code&gt;1.0.1&lt;/code&gt;) — prompt tweaks, typo fixes, no schema changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minor bumps&lt;/strong&gt; (&lt;code&gt;1.0.0&lt;/code&gt; → &lt;code&gt;1.1.0&lt;/code&gt;) — added optional output fields, expanded input acceptance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Major bumps&lt;/strong&gt; (&lt;code&gt;1.0.0&lt;/code&gt; → &lt;code&gt;2.0.0&lt;/code&gt;) — schema changes, prompt overhauls, model swaps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For CI agents or anything that fans out to production data, pin. Always.&lt;/p&gt;

&lt;h2&gt;
  
  
  Publishing your own
&lt;/h2&gt;

&lt;p&gt;The registry is open for contributions. If you build a well-scoped agent spec that others could reuse — a good test generator, a bug-triager, a changelog writer — open a PR to add it.&lt;/p&gt;

&lt;p&gt;Specs that make good registry citizens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-responsibility&lt;/strong&gt; — do one thing, not five&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stable I/O schema&lt;/strong&gt; — typed inputs, typed outputs, documented fields&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engine-agnostic&lt;/strong&gt; — work across OpenAI, Claude, Grok, or local models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal prompts&lt;/strong&gt; — the less cleverness, the less breakage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;Add a delegated task to any existing spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;summarise&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oa://prime-vector/summariser@1.0.0&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;summarise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oa run &lt;span class="nt"&gt;--spec&lt;/span&gt; .agents/my-pipeline.yaml &lt;span class="nt"&gt;--task&lt;/span&gt; summarise &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--input&lt;/span&gt; &lt;span class="s1"&gt;'{"document":"..."}'&lt;/span&gt; &lt;span class="nt"&gt;--quiet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime resolves the registry reference, loads the spec, runs the delegated task, and returns the result. No extra install. No framework to adopt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openagentspec.dev/registry" rel="noopener noreferrer"&gt;OAS Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/prime-vector/open-agent-spec/blob/main/spec/open-agent-spec-1.4.md" rel="noopener noreferrer"&gt;Formal specification — Section 7.3, Spec Delegation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/prime-vector/open-agent-spec" rel="noopener noreferrer"&gt;GitHub: prime-vector/open-agent-spec&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Also in this series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="//./01-multi-agent-pipelines-yaml.md"&gt;The Sidecar Agent: Add AI to Any Project Without a Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./02-zero-sdk-agents.md"&gt;Why Your AI Agent Sidecar Shouldn't Have SDK Dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./03-formal-spec-conformance.md"&gt;We Published a Formal Spec for Our Agent Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//./05-testing-agents-oa-test.md"&gt;Testing AI Agents Like Code: the &lt;code&gt;oa test&lt;/code&gt; Harness&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Open Agent Spec is MIT-licensed and maintained by &lt;a href="https://www.primevector.com.au" rel="noopener noreferrer"&gt;Prime Vector&lt;/a&gt;. The registry is open — we'd love to see what you build.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
