<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: scarab systems</title>
    <description>The latest articles on Forem by scarab systems (@scarab-systems).</description>
    <link>https://forem.com/scarab-systems</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3950086%2F965f3e72-b17d-49d7-a92a-d14eb00e3383.png</url>
      <title>Forem: scarab systems</title>
      <link>https://forem.com/scarab-systems</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/scarab-systems"/>
    <language>en</language>
    <item>
      <title>I’m Building Around the Gap Between AI Output and Repo Truth</title>
      <dc:creator>scarab systems</dc:creator>
      <pubDate>Mon, 25 May 2026 11:02:10 +0000</pubDate>
      <link>https://forem.com/scarab-systems/im-building-around-the-gap-between-ai-output-and-repo-truth-mn6</link>
      <guid>https://forem.com/scarab-systems/im-building-around-the-gap-between-ai-output-and-repo-truth-mn6</guid>
      <description>&lt;p&gt;I’ve been thinking a lot about a failure mode in AI-assisted development that does not fit neatly into “bad code” or “bad prompting.”&lt;/p&gt;

&lt;p&gt;The agent can be useful.&lt;br&gt;
The output can look good.&lt;br&gt;
The tests can even exist.&lt;/p&gt;

&lt;p&gt;And the repo can still become less trustworthy.&lt;br&gt;
That is the uncomfortable part.&lt;/p&gt;

&lt;p&gt;Most AI coding conversations still orbit the agent itself:&lt;br&gt;
How do we improve its context? &lt;br&gt;
How do we write better instructions? &lt;br&gt;
How do we make it remember project rules? &lt;br&gt;
How do we coordinate multiple agents? &lt;br&gt;
How do we make it plan longer?&lt;/p&gt;

&lt;p&gt;Those are real problems.&lt;br&gt;
But I keep seeing another layer.&lt;br&gt;
The agent can have good context and still leave behind a repo that is hard to trust.&lt;/p&gt;

&lt;p&gt;Because the repo is not only code.&lt;/p&gt;

&lt;p&gt;It is the accumulated state of the project: structure, assumptions, tests, docs, runtime expectations, old scaffolding, partial implementation, cleanup debt, and claims about what is finished.&lt;br&gt;
That state needs supervision.&lt;br&gt;
Not in the sense of replacing the developer.&lt;br&gt;
Not in the sense of replacing the AI agent.&lt;br&gt;
And not in the sense of letting another AI simply judge whether the first AI did a good job.&lt;br&gt;
I mean something more grounded than that.&lt;/p&gt;

&lt;p&gt;A repo needs a local way to separate:&lt;br&gt;
implemented from claimed, verified from assumed, scaffolded from real, current from stale, organized from merely arranged, safe cleanup from risky cleanup, and “looks done” from actually done.&lt;/p&gt;

&lt;p&gt;That is the area I’ve been building in.&lt;br&gt;
The product I’m finishing is called Scarab Diagnostic Suite.&lt;br&gt;
I’m not ready to do the full launch post yet, but the core idea is simple:&lt;br&gt;
AI can build fast. Scarab helps keep the repo true.&lt;br&gt;
Scarab is not a code generator.&lt;br&gt;
It is not a prompt pack.&lt;br&gt;
It is not trying to become the AI agent.&lt;br&gt;
It is a CLI-installed diagnostic and supervision suite for AI-assisted repositories.&lt;/p&gt;

&lt;p&gt;The design philosophy is that AI agents need a stable operating environment around them. The agent can drift, lose context, overstate progress, or build on assumptions that are no longer true. The repo needs a separate layer that checks, records, warns, blocks, and guides the next step.&lt;/p&gt;

&lt;p&gt;That means the important question changes.&lt;br&gt;
Instead of only asking:&lt;br&gt;
Can the agent do the task?&lt;br&gt;
We also ask:&lt;br&gt;
Can the repo still prove what happened?&lt;/p&gt;

&lt;p&gt;That shift has changed how I think about AI coding completely.&lt;br&gt;
I don’t think the future is just bigger context windows or more autonomous agents.&lt;br&gt;
Those things will happen.&lt;br&gt;
But the more autonomy we give agents, the more important it becomes to have something stable around them.&lt;br&gt;
Something that does not simply trust the agent’s confidence.&lt;br&gt;
Something that can say:&lt;br&gt;
This is safe.&lt;br&gt;
This is incomplete.&lt;br&gt;
This is stale.&lt;br&gt;
This needs review.&lt;br&gt;
This should not be cleaned automatically.&lt;br&gt;
This requires a stronger baseline before deeper diagnostics should be trusted.&lt;br&gt;
That is the kind of product space I think is about to matter a lot.&lt;br&gt;
The AI coding agent may be the worker.&lt;br&gt;
But the repo still needs a way to maintain truth.&lt;br&gt;
That is the layer I’m building around.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>programming</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
