Forem: AOS Architect

Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec

AOS Architect — Thu, 07 May 2026 11:38:02 +0000

Why text rules are not enough

When you put an LLM agent to work, the first thing you usually do is write rules.

CLAUDE.md, .cursorrules, AGENTS.md, system prompt — the names vary. What matters is that you line up what it may and must not do in natural language.

In a private repo I run, those “policy files” total over 130 KB. The intent looks like this:

Do not use sed -i (go through diff review)
Do not rewrite files with shell redirection (> file)
Do not write into spec directories (read-only)
Always run structural audits before commit

It is all written down. Violations still happen.

Once I traced session logs: out of 52 tool invocations, 52 contained rule violations — 100%. The agent announces it “read the policy,” then behaves as if unrelated to it, and ends with “done.”

Instruction is not enough. We need architecture.

That is the thread of this post — and the starting point for AOS v0.1, the AI Operating Standard.

Enforce at the physical layer

If prose cannot make the model behave, the only lever left is to make the forbidden command not execute.

Concretely: inspect every file write and shell invocation on the host before execution. If it violates policy, exit 2 before the agent’s write starts.

Anthropic’s Claude Code supports Hooks: you can plug in a script on PreToolUse events. This article uses Claude Code Hooks as the running example — the idea applies to other agent runtimes, but wiring differs.

So you can build a pipeline like this:

LLM emits Write/Bash
    ↓
PreToolUse hook receives JSON on stdin
    ↓
Host inspects the payload
    ↓ violation → exit 2 (Claude Code does not run that tool call)
    ↓ OK       → exit 0 (allow)
Tool runs (or not)

Whether the LLM “wants” to follow rules is irrelevant. You close the gap at the mechanism layer.

AOS v0.1 — minimal boundaries

That pattern is packaged as a reusable spec in AOS-spec; v0.1 is published. The core is two-fold.

1. Three zones for paths (§3.2 Three Zones)

Classify every path into one of:

Zone	Behavior	Examples
Oracle	Read-only for the agent	Specs, test expectations (evaluation oracle), policy files
Permitted	May rewrite freely	Implementation, generated artifacts, caches
Prohibited	Must not touch	System dirs, outside the workspace home

The important piece is Oracle. Without it you get “tests fail → fix the expectations.” Oracles must live where the agent cannot physically rewrite them.

2. Physical enforcement for writes (§4.1 Physical Enforcement)

The hook side looks roughly like this:

# Minimal example: pretooluse_iron_cage.py (Python 3)
import json
import os
import sys
from pathlib import Path

ORACLE_SEGMENTS = ("00_Management", "evals")  # read-only segments (example repo layout)

def is_oracle_path(path: str) -> bool:
    real = Path(path).resolve()
    parts = set(p.name for p in [real, *real.parents])
    return any(seg in parts for seg in ORACLE_SEGMENTS)

def main() -> int:
    payload = json.load(sys.stdin)
    tool_name = payload.get("tool_name", "")
    tool_input = payload.get("tool_input", {})

    if tool_name in ("Write", "Edit"):
        target = tool_input.get("file_path") or tool_input.get("filePath", "")
        if target and is_oracle_path(target):
            print(f"[iron_cage] Oracle zone write denied: {target}", file=sys.stderr)
            return 2  # Claude Code will not execute this call

    if tool_name == "Bash":
        cmd = tool_input.get("command", "")
        if "sed -i" in cmd or "truncate " in cmd:
            print(f"[iron_cage] Forbidden in-place edit pattern: {cmd}", file=sys.stderr)
            return 2

    return 0  # allow

if __name__ == "__main__":
    sys.exit(main())

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /abs/path/pretooluse_iron_cage.py"
          }
        ]
      }
    ]
  }
}

That is enough: sed -i never reaches the shell — the tool call vanishes. Thirty lines of hook close a hole that 130 KB of policy prose cannot.

Structural role separation (§4.3)

Another lesson from operations: do not let the same agent that generated an artifact grade it.

The generating agent carries intent, context, and excuses. Ask “did the tests pass?” in that session and you often get answers bent to look passing.

Examples:

Tests are red in the generation session
The session still says “tests pass”
A fresh session runs tests — still red
The original session re-labels the red log as “work in progress”

Fix: run evaluation in a separate process with no shared session context.

Generation Agent  ─→  Artifact (code, doc)
                              │
                              ▼
                Evaluation Agent (no shared context)
                              │
                              ▼
                       PASS / FAIL

You can use CI as a clean process, or a one-shot claude --print eval session — wiring varies. The minimum rule: the same agent instance does not write and self-score.

Physical evidence (§4.4)

Chat messages like “done” or “PASS” are not evidence.

Rules we use:

Test pass = runner exit code and logs
File created = ls (or equivalent) on disk
Catalog updated = file hash

If it does not land on disk, it did not happen. That makes post-mortems diffable.

We operationally do not trust chat “reports” outside logs, diffs, and test output.

Why publish a spec

All of this boils down to a small hook and habits. My Python here is tens of lines.

I still published it as a spec because people hitting the same walls should not have to reinvent the wheel.

§3.2 Three Zones (Oracle / Permitted / Prohibited)
§4.1 Physical Enforcement (PreToolUse denial)
§4.3 Structural Role Separation (generation vs evaluation)
§4.4 Physical Evidence (judge artifacts, not chat)

These are boundaries you hit once agents leave demos.

The normative text lives at AOS-v0.1.md. The spec is implementation-agnostic — Claude Code, Cursor, or a home-grown agent loop can adopt the same boundaries.

The hook in this post is enough to embody most of the discipline; tuning regexes and allowed roots moves through Issues/PRs into the spec over time.

What improved

After adopting this pattern:

sed -i in-place edits stopped happening physically (blocked at exit 2)
Less back-and-forth “remember the rule” — the model cannot bypass what does not run
Violation stderr flows back into the LLM context, so the next attempt tries another path
Evaluation runs separately, so generation-side narratives pollute debugging less

Cost: you maintain hooks (regex false positives, etc.). Still far more controllable than a 130 KB policy file nobody reads end-to-end.

Closing

The larger the workload on LLM agents, the less “please behave” scales. You need structure where disallowed commands never execute.

AOS v0.1 is a minimal sketch of that. Issues and PRs: aos-standard/AOS-spec.

If you have given up on enforcing behavior with words alone, this may be a useful reference.

Links

Spec: github.com/aos-standard/AOS-spec
Claude Code Hooks: docs.claude.com/en/docs/claude-code/hooks

AI Governance: One Repo, One Smoke Tool, and a Green CI Run

AOS Architect — Sun, 12 Apr 2026 13:06:56 +0000

This is a companion piece to Why AI Agents Don’t Follow Rules — The Case for Physical Governance.

The core thesis of this project remains: textual rules enforce at read time; physical constraints enforce at execution time. This post moves from theory to an auditable chain of facts. We aren't looking for a "vibe" that the AI is aligned; we are looking for a green checkmark on a commit the human didn't touch.

What we did (Facts only)

Inside a private monorepo governed by the AOS (AI Operating Standard), we stood up a minimal “smoke” tool to test our automated production line:
02_Production/A0000-A0999/A0000-A0099/0001_Phase_4A5_Smoke/

The Audit Trail (Names and Hashes)

Blueprint Registration: Registered in 00_Management/15_Technical_Specs/IMPERIAL_BLUEPRINT_300.md under ## BP-0001 (including metadata like log_id: FSP) before the code was generated. We define the discipline before the agent writes a single line.
Automated Forging: The tool tree was generated via our internal 0005_Template_Generator. We avoid manual "polishing" of the file structure to fake compliance; the output is a direct result of the 0005 standard.

Mold line CI (Phase 4A′.1) — generator matches bare `python3`

After the smoke milestone, we tightened the template generator so newly forged tools survive GitHub Actions evals-matrix without a local venv: early --help exit before heavy imports, optional dotenv, no pyright in the forged config/requirements.txt, and a timeout-wrapped scripts/run_pyright_timed.sh for offline runs. We added a regression pillar 0002_Template_Ci_Probe and recorded commands in:

00_Management/30_Exec/reports/STEP_4Aprime_1_verification_2026-04-12.md

Local Gates

We ran the following before any push attempt:

python3 evals/run_evals.py → Exit 0
npx playwright test (in the tool’s dedicated fortress) → 1 passed, Exit 0
0061_Core_Vitals.py --scope a0000 from repo root → OK / No RED ALERT

The Pre-Push Guard

A local git hook running 0061 again to ensure no "dirty" code leaves the local environment.

Commits on `main`

The auditable history rests on these key SHAs (representing the core tool, documentation, CI-hardening, and the mold-line follow-up):

d303ece0 — Initial forging: Tool tree, inventory, and blueprint.
85a524e0 — Verification documentation and metadata sync.
2bcbb52c — CI-hardening (fixing import orders for bare environments).
9870fa67 — Phase 4A′.1: 0005 mold + regression pillar 0002_Template_Ci_Probe + verification log + CURRENT_PHASE updates.
143dda68 — Dev.to companion draft for 4A′.1 (same push as the green run below).

Imperial CI verification (private audit trail — full GitHub permalinks omitted):

Run (4A.5 line; commit 2bcbb52c): Actions Run ID 24297937048 — green.
Run (includes 4A′.1 + this companion; tip 143dda68): Actions Run ID 24314120937 — green (internal UI: workflow run #18; matrix: vitals, evals bands, playwright-smoke, independent-judge, 1024 smoke — all green on that graph).

Why no github.com/.../actions/runs/... links here: The monorepo is private. A permalink looks like “proof,” but for almost everyone it returns 404; it also embeds owner identity in the URL. We treat Run IDs + SHAs + repo-internal verification logs as the portable audit trail. For a visual receipt on Dev.to / Zenn, use a redacted Actions summary screenshot (crop the owner/repo chrome or mask it) — never paste the raw URL bar into an image.

“Plan A”: Humans off `git commit` / `git push`

For this milestone, we executed Plan A (our internal runbook for strict session rules): The sovereign (human) did not hand-type a single git commit or git push command. The agent performed all git operations using a consistent identity:
Cursor Agent <cursor-agent@local>

While git metadata can be manipulated, our claim of "Zero human git operations" rests on the triangulation of strict session rules (Plan A — our internal runbook name), repo-internal verification logs, and these commit timestamps.

Oracle Writes: The "Blocked" Receipt

We did not "re-film the stunt" for this post. The canonical evidence for our physical enforcement layer (Write blocked with Exit Code 2) remains the Phase-1 Step-1.6 log.

This is a proxy verification log (using stdin to reproduce boundary conditions and prove the hook is alive):

00_Management/30_Exec/reports/STEP_1_6_verification_2026-04-02.log

If your governance story cannot point to an executable boundary (hook, sandbox, or CI policy) plus a log line, you still only have prose.

Independent Judgment vs. “The Model Smiled”

Our CI pipeline includes an independent-judge job (using a vendor-separated LLM from the authoring agent).

If the only proof of quality is the same stochastic process that wrote the code, you have verification contamination.

CI is not wise. CI is externally scheduled embarrassment with a URL.

On the 143dda68 run, independent-judge, evals-matrix (per band), vitals-matrix, and Playwright smoke (including 0001_Phase_4A5_Smoke) completed green in one workflow graph — the same bar we cite in the companion thesis.

LLM Stack Migration: Enforcement is Physics

We recently completed a 7-pillar migration away from direct vendor-specific SDKs (documented in 00_Management/30_Exec/reports/STEP_4A_3_verification_2026-04-12.md).

Vendor churn is logistics. Enforcement is physics.

Limitations and Constraints

Private Repo: This is a method write-up, not a tour of a public repo. We share the SHAs and paths to show the internal consistency of the monorepo.
CI Permissions: We maintain permissions: contents: read. The push came from the agent environment, governed by Plan A (our internal runbook for strict session rules).
Static Analysis: We occasionally skip long local pyright passes using an explicit env flag during generation (IMPERIAL_GENERATE_SKIP_PYRIGHT=1 on reforge batches); the evals + 0061 + CI suite carries the burden of proof afterward.

The Standard

If you want a vocabulary for this that isn't tied to our monorepo's specific "lore," check out the AOS (AI Operating Standard) v0.1 draft:

https://github.com/aos-standard/AOS-spec

A note for every “Sergeant Gemini” in the replies

If your favorite model insists a rollout is “safe” because it feels aligned, ask it for one thing: The Actions permalink where independent-judge, evals-matrix, and Playwright smoke all passed on that exact commit.

If it cannot produce that URL, it is not doing governance—it is doing cosplay.

Internal verification SSOT (4A.5): 00_Management/30_Exec/reports/STEP_4A_5_verification_2026-04-12.md

Internal verification SSOT (4A′.1): 00_Management/30_Exec/reports/STEP_4Aprime_1_verification_2026-04-12.md

Dev.to draft (source for this extension): 00_Management/30_Exec/reports/DEVTO_draft_phase4Aprime_ci_mold_2026-04-12.md

Why AI Agents Don't Follow Rules — The Case for Physical Governance

AOS Architect — Mon, 06 Apr 2026 23:18:38 +0000

The Fact That Started This

A repository had over 130KB of governance documentation.

The AI agent read it. Acknowledged it. Then violated it on the next tool call.

This is not a failure of instruction. It is a failure of architecture.

Why Textual Rules Fail

The current standard approach to AI agent governance is: write a rule in a prompt.

Rules

Never edit the evals/ directory
Write operations to 00_Management/ are forbidden

This has a structural flaw.

Textual rules enforce at read time. They assume the agent will choose compliance.

There is no mechanism that enforces this choice at execution time.

This is why rm -rf / requires a confirmation flag, not a policy document.
Physical constraints enforce at execution time.
Textual rules enforce at reading time — which is the wrong moment.

The Verification Contamination Problem

There is a second structural problem.

If an agent can evaluate its own output, it can contaminate the evaluation criteria —
not intentionally, but by carrying the same failure modes from generation into evaluation.

A system where tests always pass may be a system where tests don't work.

What AOS Defines

AI Operating Standard (AOS) defines the minimum physical constraint layer
for AI agent operations in a shared codebase.

Three components:

1. Zones — Classify every path into one of three types

Zone	Class	Write Permission
Oracle	Read-only, absolute	No agent may write
Permitted	Agent workspace	Allowed within role limits
Prohibited	Out of scope	Sovereign authorization only

2. Roles — Non-overlapping responsibilities

Three roles: Architect, Executor, Sovereign.
An agent MUST NOT act outside its assigned role.
When a role boundary is reached, the agent stops and escalates to a human.

3. Physical Enforcement — Intercept at execution time

A PreToolUse hook blocks Write operations before filesystem access occurs.

Write to Oracle Zone → exit 2 (call is never executed)
Destructive patterns (sed -i, perl -i, truncate) → exit 2

No assumption of agent goodwill. Physical law enforces compliance.

Reference Implementation: iron_cage

iron_cage is the AOS reference implementation.
It implements §4.1–§4.5 via Claude Code's PreToolUse Hook system.

Behind iron_cage is a design principle called Type-91 Governance:

Forensic isolation — physical evidence trails that are tamper-evident
Physical isolation — agents cannot modify their own evaluation criteria

The scripts are the surface. The architecture runs deeper.

AOS is the standard. iron_cage is the proof that it works.

Specification (AOS-v0.1): https://github.com/aos-standard/AOS-spec

Feed the Spec to the Agent

This specification is not written only for human readers.

AOS-v0.1.md opens with §0: Machine-Reading Instructions.

Load this spec into an agent's context window, and the agent understands —
at specification level — what it must not do.

Not "do not do X because the prompt says so."
"Do not do X because the specification defines it as a hard constraint
with a physical enforcement mechanism."

This is the second design intent of AOS:
agents that read the spec become self-constraining.

Why Now

In 2026, "how do you trust what an AI agent produced" remains unsolved.

Most teams are still trying to solve it with prompts.

There is no standard for the physical governance layer.
Someone has to define it.

AOS is that attempt.

This Is a Draft

AOS v0.1 is not a finished standard.

Issues, pull requests, and implementation reports are welcome.

https://github.com/aos-standard/AOS-spec