I’m Building Around the Gap Between AI Output and Repo Truth

scarab systems — Mon, 25 May 2026 11:02:10 +0000

I’ve been thinking a lot about a failure mode in AI-assisted development that does not fit neatly into “bad code” or “bad prompting.”

The agent can be useful.
The output can look good.
The tests can even exist.

And the repo can still become less trustworthy.
That is the uncomfortable part.

Most AI coding conversations still orbit the agent itself:
How do we improve its context? 
How do we write better instructions? 
How do we make it remember project rules? 
How do we coordinate multiple agents? 
How do we make it plan longer?

Those are real problems.
But I keep seeing another layer.
The agent can have good context and still leave behind a repo that is hard to trust.

Because the repo is not only code.

It is the accumulated state of the project: structure, assumptions, tests, docs, runtime expectations, old scaffolding, partial implementation, cleanup debt, and claims about what is finished.
That state needs supervision.
Not in the sense of replacing the developer.
Not in the sense of replacing the AI agent.
And not in the sense of letting another AI simply judge whether the first AI did a good job.
I mean something more grounded than that.

A repo needs a local way to separate:
implemented from claimed, verified from assumed, scaffolded from real, current from stale, organized from merely arranged, safe cleanup from risky cleanup, and “looks done” from actually done.

That is the area I’ve been building in.
The product I’m finishing is called Scarab Diagnostic Suite.
I’m not ready to do the full launch post yet, but the core idea is simple:
AI can build fast. Scarab helps keep the repo true.
Scarab is not a code generator.
It is not a prompt pack.
It is not trying to become the AI agent.
It is a CLI-installed diagnostic and supervision suite for AI-assisted repositories.

The design philosophy is that AI agents need a stable operating environment around them. The agent can drift, lose context, overstate progress, or build on assumptions that are no longer true. The repo needs a separate layer that checks, records, warns, blocks, and guides the next step.

That means the important question changes.
Instead of only asking:
Can the agent do the task?
We also ask:
Can the repo still prove what happened?

That shift has changed how I think about AI coding completely.
I don’t think the future is just bigger context windows or more autonomous agents.
Those things will happen.
But the more autonomy we give agents, the more important it becomes to have something stable around them.
Something that does not simply trust the agent’s confidence.
Something that can say:
This is safe.
This is incomplete.
This is stale.
This needs review.
This should not be cleaned automatically.
This requires a stronger baseline before deeper diagnostics should be trusted.
That is the kind of product space I think is about to matter a lot.
The AI coding agent may be the worker.
But the repo still needs a way to maintain truth.
That is the layer I’m building around.

Forem: scarab systems

I’m Building Around the Gap Between AI Output and Repo Truth