Forem: Shimo

Between the Workflow and ReAct Quadrants: How Phase Decides Skill Design

Shimo — Sat, 02 May 2026 03:10:17 +0000

Series position: The fourth article in the ReAct agent quadrant series. The quadrant names align with the previous articles and with the AAP repository.

Premise — The Four Quadrants Again

Recapping the four quadrants set up in the first article:

(1) Script Quadrant — deterministic × definable. Handled by scripts and pipelines.
(2) Algorithmic Search Quadrant — deterministic × exploratory. Classical AI / OR territory.
(3) LLM Workflow Quadrant — semantic judgment × definable. Calls an LLM inside a predefined workflow.
(4) Autonomous Agentic Loop Quadrant — semantic judgment × exploratory. An autonomous loop where the LLM itself decides the next action (= ReAct agent).

Up to the previous article, I treated (3) and (4) as a dichotomy — "pick one based on the nature of the work." The second article pointed out that the industry has no settled vocabulary for (3); the third article framed the dichotomy along the business-system axis and introduced the Phase Separation between design phase and operation phase. This article brings that Phase Separation down to the resolution of individual skill design and tries to show that, between the two poles of the dichotomy, a continuous gradient exists.

Observation — The Same Job Lives in Multiple Forms

Looking around my own repositories, I notice that conceptually identical jobs are implemented in several different forms. The Promote phase of AKC (Agent Knowledge Cycle) — a six-phase cycle in which an agent distills knowledge from its own outputs to self-improve — is a typical example. The job is to extract recurring principles from skills and promote them to rules. Generalized, it's the work of "extracting principles from repetition." Mining alert rules from logs, finding common patterns across a codebase to refactor, weaving an FAQ from customer-support conversation logs — they all belong to the same family.

In contemplative-agent there's a Python function called core/rules_distill.py. It vectorizes skills with embeddings, builds theme-based clusters via cluster_patterns, batches LLM calls with MAX_RULES_BATCH=10, and the cluster threshold CLUSTER_THRESHOLD_RULES=0.65 is calibrated with a calibration file. The LLM call is just one step in the pipeline, and most of the judgment has been frozen into code in advance.

The rules-distill skill (~/.claude/skills/rules-distill/SKILL.md), by contrast, is written in natural language. In Phase 1, scan-skills.sh produces the file listing; in Phase 2, theme-based cluster judgment is delegated to a sub-agent; merges across batches are also decided by LLM judgment. Scripts are used only for listing and persistence — the judgment itself sits in the runtime LLM's hands.

The two are doing the same job. And yet the implementations are dramatically different.

Surface Hypothesis — Model Capability

The first explanation that comes to mind is model capability. core/rules_distill.py was written to run in a 9B local-model environment. In my hands-on experience, getting a 9B model to robustly run theme-based cluster judgment at runtime is hard. So a deterministic scaffold of embeddings + clustering + threshold tuning limits the 9B's role to "generate one rule candidate from a given cluster." Conversely, in a Claude Code environment where Opus is callable at runtime, that scaffold is no longer needed. Cluster judgment and merge judgment can both be performed by an LLM that takes the entire context into account.

This is consistent with a design principle recorded by AKC — README's Design Principle 5: "Evaluation scales with model capability — small models run on rubric evaluation; Opus class runs on qualitative evaluation that takes the full context into account." The original is scoped to evaluation, but extended to implementation-position choice — varying judgment scale with capability — the 9B-era Python pipeline and the Opus-era natural-language skill read as the two endpoints of that extension.

But this explanation is only stroking the surface of where things get implemented. The capability gap is an observable fact, but it's a downstream consequence of a deeper decision.

Deeper Hypothesis — The AAP Phase

Back to the distinction between design phase and operation phase from the previous article, Is ReAct Needed in Production?. What's really driving the choice of implementation position is this phase axis.

contemplative-agent sits in the operation phase. Once deployed, inputs flow through fixed routes. The boundaries between inputs (persona templates, dialogue, memory) and outputs (responses, logs) are settled in advance, and the internal knowledge cycle similarly runs through six predefined phases. Because the pipeline is frozen for operation, structures like embeddings / clustering / thresholds / batches in core/rules_distill.py can be baked into Python. The 9B is a choice that becomes available in this environment, not the cause.

In fact, the contemplative-agent pipeline is built so that the LLM-call portion can optionally be swapped to Claude or GPT models. Scaling capability upward leaves the LLM-function pipeline largely intact. This works as a counter-experiment against the surface hypothesis (capability requires the pipeline structure). If capability were the cause, the pipeline structure should become redundant and start to fall apart the moment you switch to Opus class. But it doesn't. What requires the pipeline structure isn't capability — it's the phase decision of being in the operation phase. Capability is just a downstream choice that follows from the phase decision.

Claude Code is a tool that lives in the design phase. Each session is itself exploratory work: the target codebase, the IDE, the agents being used, and even the file types being handled (.py / .ts / .md / .ipynb / arbitrary) shift between sessions. One time it's a Python project, another time a Next.js project, another time a documentation repo centered on ADRs. If a skill freezes paths or file structure in advance, it breaks the moment the environment changes. So the skill body is written in natural language and adapts to the runtime context. Opus is a consequence — it's the capability needed to support the holistic judgment that runtime adaptation demands — not the cause.

So:

contemplative-agent can be written in core/rules_distill.py form = because the pipeline was frozen for the operation phase
Claude Code has to be written in rules-distill skill form = because the design phase makes freezing the environment impossible

Capability appears to sit downstream of this phase decision. The phase draws the line between "hardcodable" and "flexibility-required," and inside that line capability gets selected. AKC's extended principle "capability ↑ → holistic judgment OK" reads as a secondary principle that holds once the phase decision is taken as given.

Inside a Phase — Same Phase, but Task Pulls Position

Even narrowing the lens to the design phase, implementation position splits further.

There's a skill called skill-comply. It measures whether Claude's skill / rule / agent definitions are actually being followed at runtime. It auto-generates scenarios at three prompt-strictness levels (supportive → neutral → competing), runs claude -p, and classifies tool-call traces to report compliance rates. The directory contains pyproject.toml / prompts/ / fixtures/ / tests/ / scripts/ / results/. The internal requirement of the task is that scenario execution be reproducible — otherwise compliance rates are meaningless. Generalized, it sits next to static code analysis (lint, SonarQube) or automated benchmark measurement: jobs that require the same input to return the same verdict. It's a skill in the design phase, but the task nature of evaluation/measurement requires reproducibility.

A different skill, search-first, looks for existing libraries and tools before new implementation begins. The directory contains only a single SKILL.md file. It delegates to the scout agent and returns an Adopt / Extend / Build verdict. Generalized, it's close to what an engineer does when, before building a new feature, they roam GitHub and PyPI to narrow down candidate libraries and decide context-appropriately whether to Adopt or Build. The same set of candidates doesn't need to come back every time — context-appropriate verdicts are what's wanted. It's a skill in the design phase, where the task nature of selection demands pure judgment. Reproducibility is secondary; returning a slightly different candidate set each time is fine.

Both sit in the design phase and run in the same environment of Opus + open environment. And yet implementation position splits, with one (skill-comply) closer to workflow and the other (search-first) closer to ReAct. This shows that even after the phase is fixed, task nature pulls position independently.

That said, phase and task aren't fully orthogonal. Tasks with strong reproducibility requirements in the operation phase get pushed hard toward the workflow side — phase changes how task nature surfaces. This article handles the two axes coarsely as a split; examining the size of the interaction more precisely is out of scope here.

The Same Phase Axis Descends Into Skills

The same phase axis descends not only at the skill level but at the level of subcomponents inside a skill. The contrast between skill-stocktake and context-sync is emblematic.

skill-stocktake is a skill that lists installed Claude skills and commands and audits their quality. The domain definition — "which skill files are evaluation targets" — is hardcoded in scripts (~/.claude/skills/, {cwd}/.claude/skills/). Generalized, it sits next to SBOM (Software Bill of Materials, a list of dependency libraries contained in a piece of software) generation and dependency scanning: jobs that require mechanically enumerating targets. The benefit is no targets get missed. The drawback is that it assumes the Claude Code environment and won't run as-is in a different coding-agent environment. Generalizing it would mean writing per-agent scripts, and maintenance cost piles up.

context-sync is a skill that detects role overlap, staleness, and gaps in a project's documentation (CLAUDE.md / CODEMAPS / ADR / README, etc.) and fixes them. Unlike skill-stocktake, the domain definition is delegated to the LLM's holistic judgment. Generalized, it's close to what a tech lead does when reading a project's architecture and judging "this explanation is stale, that should be moved to an ADR." So if it finds an llms.txt it includes that as a sync target, and it picks up Codemaps, ADRs, and AGENTS.md (a file that plays the same role under a different name) by inference. The benefit is portability across environments. The drawback is that detection coverage wavers between runs, and at full codebase scale, even when detection succeeds, the diff can't be fully captured.

Comparison	skill-stocktake	context-sync
Where domain definition lives	Hardcoded in scripts	LLM judgment
Coverage	No misses	Wavers
Portability	Single-environment	Cross-environment
Scale tolerance	High	Low

By "subcomponent" here I mean the processing steps inside a skill. From the outside, one skill is one job; internally it splits into several steps. Take skill-stocktake: it can be split into (A) the target-enumeration step that decides "which skill files are evaluation targets" and (B) the quality-judgment step that "judges the quality of each skill." context-sync is isomorphic: (A) "which documents to sync" (target enumeration) and (B) "judge staleness and which sections should be moved" (quality judgment). What I'm calling a subcomponent is a step inside the skill, like A or B above.

What this shows is that the phase axis descends not only at the skill level but at the subcomponent level inside a skill. Even between two skills that belong to the same phase, the layout step by step differs. skill-stocktake is a hybrid layout that puts A (target enumeration) in scripts and B (quality judgment) in the LLM. context-sync is a ReAct-leaning layout that puts both A and B in the LLM. Even within a single skill, one part can sit on the workflow side and another on the ReAct side.

The reason the two land on different layouts inside the same design phase is that they differ in target identifiability. The design phase is, by nature, "the environment moves," but how it moves varies by target. The targets of skill-stocktake — Claude's skills — sit at fixed locations under fixed naming conventions: ~/.claude/skills/ (global) and {cwd}/.claude/skills/ (project). Even in the design phase, paths can be hardcoded in scripts. The targets of context-sync (CLAUDE.md / CODEMAPS/ / docs/adr/ / AGENTS.md / llms.txt ...), by contrast, vary in placement and naming across codebases. They can't be written into scripts, so the only option is to delegate to the LLM's holistic judgment.

A further limit, on the LLM-judgment side, is scale tolerance. Up to mid-sized codebases, the LLM's holistic judgment covers the whole, but at full scale it starts missing diffs. Even when capability climbs sufficiently, this miss rate doesn't disappear. AKC's Design Principle (capability ↑ → holistic judgment OK) covers neither target identifiability nor scale tolerance.

Closing

The (3) LLM Workflow Quadrant and the (4) Autonomous Agentic Loop Quadrant are useful as an axis for classifying the nature of work. But once you think about individual skill design, between the two quadrants there's a continuous gradient, and where you land on that gradient appears to be decided primarily by the phase decision. The capability gap is just a result observed downstream of the phase decision.

As the second article pointed out, the industry doesn't yet have settled vocabulary for talking about (3) as an independent quadrant. This article tries to step further from that point, into the continuous gradient inside the dichotomy. The same job gets implemented in different positions in the operation phase versus the design phase. Within the same phase, position shifts when the task nature differs. Inside a single skill, position splits subcomponent by subcomponent.

The previous article framed Phase Separation as an axis of business systems; this article tried to show that the same axis runs all the way down to the resolution of individual skill design. The same phase axis primarily decides both how business gets carved up and where skill implementation lands.

One direction left open is the observation that the optimal position shifts when the phase shifts. An implementation position that was correct in one environment is no longer optimal once the phase changes. How to translate the question this raises into implementation — I don't have an answer in hand. I'll leave it as a design problem outside the scope of this article.

Previous articles in this series:
AI agent governance trilogy:
Agent Knowledge Cycle (AKC) — source of the "capability ↑ → holistic judgment" principle referenced in this article
Contemplative Agent — implementation of core/rules_distill.py, an example of the workflow end of the 9B era
Agent Attribution Practice (AAP) — the four-quadrant names in this series align with AAP

Is ReAct Needed in Production? — Separating Design and Operation Phases

Shimo — Thu, 30 Apr 2026 11:19:11 +0000

Series position: The third article in the ReAct agent quadrant series. The quadrant names align with the previous articles and with the AAP repository.

Premise

In the first article I split business AI into four quadrants and wrote that ReAct agents are legitimately needed only in the (4) ReAct Quadrant. In the second article I observed that the industry's vocabulary has no independent name for the (3) LLM Workflow Quadrant, and that this absence produces, by elimination, the choice to drape ReAct over every quadrant.

For ease of reference, the four quadrants again:

(1) Script Quadrant — deterministic × definable. Handled by scripts and pipelines.
(2) Classical AI Quadrant — deterministic × exploratory. Classical AI / OR territory (out of scope here).
(3) LLM Workflow Quadrant — semantic judgment × definable. Calls an LLM inside a predefined workflow. Includes a conversational form (specialized chat agents) and a batch form (single-purpose LLM functions).
(4) ReAct Quadrant — semantic judgment × exploratory. An autonomous loop where the LLM itself decides the next action.

A New Axis: Time

The previous articles drew the quadrants along the axis of "the nature of the work." In this article I want to introduce a deeper axis: the time axis.

Business work has a design phase and an operation phase. They are different activities, with different optimization criteria. The design phase maximizes flexibility; the operation phase maximizes predictability. The root of the agent ecosystem's confusion may be that we are trying to compress these two into a single system.

The question of this article, in one line: isn't ReAct a tool for the design phase, and unnecessary in production?

The Design Phase and the Operation Phase

The work of building a business system splits into two distinct phases.

Design phase: the work of understanding the structure of the business, deciding what to mechanize and what humans will own, and assembling the workflow. You don't know in advance what will happen, so it proceeds exploratorily. "What to check next" can't be decided ahead of time.

Operation phase: the work of actually running the workflow you designed. The structure of the target work is already known. You run the predetermined route, at a predetermined frequency, under predetermined quality standards. What's going to happen should be predictable in advance.

This distinction is not new. In software engineering it has shown up as development vs production; in BPM as as-is analysis vs operation; in systems design as design-time vs run-time. In agent discussions, however — as the previous two articles observed — the distinction tends to blur. When people say "an autonomous agent runs the business," they appear to be trying to compress design and operation into a single system.

"Not Knowing What To Do Next" in Production Means the Operation Phase's Properties Are Being Dropped

The core of ReAct is "the LLM dynamically decides what to do next." This is legitimately needed when what to do next can't be known in advance. Coding, Deep Research, exploring unknown environments, browser automation. The territories I named in the first article as legitimate ground for the (4) ReAct Quadrant can all be read as work belonging to a design phase or an exploration phase.

If "not knowing what to do next" arises inside a business that has entered production, isn't that a sign that one of the properties production demands — predictability, log traceability, cost stability, attribution — is hard to secure with ReAct's architecture? These are properties that ought to be assembled into the workflow before it's handed to production, not patched after the fact by applying ReAct to production itself.

If the dissection of the business is complete, then (1) Script Quadrant work can be written deterministically, (3) LLM Workflow Quadrant work can be fixed as single-purpose LLM functions or specialized chat agents, and any other judgment is handed off to humans at explicit handoff points. In production the workflow simply runs. What to do next is something the workflow already knows.

There's a possible counterargument. "Aren't there genuinely dynamic businesses? What about customer support, where every inquiry is new?"

This is apparent dynamism, not real. The contents of inquiries can look infinitely varied, but the routes that process them converge to a finite set of types. FAQ-style responses, expert-knowledge lookups, routing decisions, escalations — each one can be fixed as a (3) LLM Workflow Quadrant batch (single-purpose LLM function) or conversational form (specialized chat agent). The contents of an inquiry may be new, but the processing route was decided in advance.

Suppose, instead, you don't fix the routes and let an autonomous agent handle inquiries end-to-end with ReAct. When a single one of those handlings turns into a lawsuit-grade incident — wrong medical advice, unsuitable financial product recommendations, leakage of confidential information, discriminatory treatment — how does the organization show where responsibility lives? "The agent decided dynamically" is unlikely to suffice. The attribution gap I wrote about in the second article — the inability to trace a chain of judgments back to a specific cause after the fact — comes back as legal risk to the organization. For low-reversibility work, where a single incident could end the organization, where is the reason to choose not to fix the routes?

Even if there's work where you'd accept the litigation risk to gain dynamic handling, work whose processing structure itself is genuinely new every time is more honestly treated as R&D-phase or exploration-phase work. It does not appear to be work that fits the definition of a production-operation phase.

When a New Pattern Appears During Production

It happens that, during production, a new pattern emerges that doesn't fit any existing processing route. How you handle this decides the relationship between design and operation.

The ReAct-style answer is "an autonomous agent dynamically handles the new pattern." But this embeds a design-phase activity inside the operation phase. The dynamic handling may work in the moment, but the basis for the handling isn't logged, there's no reproducibility, and the locus of responsibility blurs. The redirect impossibility I wrote about in the second article — the phenomenon where a ReAct loop's black-box nature makes it impossible to separate attribution at incident time — fires in the middle of production.

There is another answer. Treat the new pattern as feedback to the design phase. When a new pattern appears in production, it's a case the current workflow's design didn't anticipate. Run production through a temporary handling (human judgment, a provisional escalation), then return to the design cycle, analyze the pattern, and decide which of the four quadrants it belongs in. If it's the Script Quadrant, add a deterministic rule; if it's the LLM Workflow Quadrant, add a new single-purpose LLM function or conversational branch; if it turns out to be ReAct Quadrant work, move to the judgment of accepting the attribution gap; if it requires human judgment, make the handoff point explicit. Either way, you update the workflow.

This answer keeps the boundary between design and operation clean. The design phase moves flexibly; the operation phase moves predictably. Their optimization criteria don't get mixed.

Where Coding Agents Sit

A point the previous two articles left open can now be picked up. At least for coding agents, ReAct can be read as a place where it is legitimately exercised.

That's because coding agents live in the design phase. Each coding task is essentially exploratory work — "what to do next isn't known" — and a single coding session ends when it ends. Tools like Devin or GitHub Copilot Coding Agent run continuously in CI/CD pipelines, but the contents of each session are independent exploration each time, not a steady-state workflow defined in advance. This is the difference from a business agent.

A business agent, sooner or later, enters the production-operation phase. In the operation phase the exploration is over, so where is the reason to put ReAct there? ReAct-based products being sold as business agents appear to be making a category error of bringing a tool that belongs to the design phase into the operation phase.

Deep Research and browser automation in unknown environments sit in the same place as coding agents. These are "exploration itself," not steady-state operation. A new exploration runs each time a user asks a question, but it ends in minutes to tens of minutes. The same system doesn't keep running afterwards. Each run reads as an independent small design phase.

So the legitimate territory of ReAct may be more accurately drawn by the distinction of phase than by the quadrant of work. The design phase and the exploration phase share the property "what to do next can't be decided in advance," and in this article I treat that shared property as the criterion for applying ReAct. And that property reads as the place where (4) the ReAct Quadrant resides. Where, in the operation phase, is the reason to bring ReAct in?

The Ecosystem Problem of Compressing Design and Operation

The agent ecosystem is trying to compress the design phase and the operation phase into a single system. The vision "an autonomous agent understands the business, assembles the workflow, and executes it" assumes a system that takes on both phases at once.

The problem is that the optimization criteria of the two phases run in opposite directions. The design phase needs flexibility — you don't know what will happen, so you need to keep options open. The operation phase needs predictability — log traceability, attribution, and cost stability. Putting both into a single system means flexibility gets sacrificed for predictability and predictability for flexibility, and you end up with a system that's mediocre at both.

The design rule for separating them reads like this. The design phase is where (4) ReAct Quadrant tools (coding agents, Deep Research) live. Work that, once designed, is handed to the operation phase, runs on a combination of (1) the Script Quadrant and (3) the LLM Workflow Quadrant (with (2) the Classical AI Quadrant when needed), and the ReAct Quadrant stays in its design-phase role. When a new pattern emerges, it's sent back to the design phase.

What the Trilogy Has Made Visible

Lining up the three articles, the agent ecosystem's confusion appears to break into three layers.

The top layer is misapplication of the quadrants. The category error I wrote about in the first article — draping the (4) ReAct Quadrant's architecture over (3) LLM Workflow Quadrant work. This was the primary symptom observed on the ground.

The second layer is the absence of vocabulary. As I wrote in the second article, the industry has no positive name for the (3) LLM Workflow Quadrant. The vocabulary gap is the breeding ground for the misapplication: after carving off the deterministic part, the designer is pushed into the elimination-style choice of pouring everything else into ReAct.

The third layer is the conflation of time. The design philosophy I wrote about in this article — compressing the design phase and the operation phase into a single system. This sits even deeper than the vocabulary gap. Because the assumption "design and operation happen on the same time axis" is tacitly shared, the vocabulary to distinguish them was never demanded — that chain comes into view. As long as the vocabulary stays missing, draping a design-phase tool (ReAct) over the operation phase doesn't feel jarring.

The three layers aren't independent observations; they form a chain that descends from top to bottom. The misapplication is born of the vocabulary gap; the vocabulary gap is born of the time-axis conflation. Writing the three articles in order, there was a felt sense of resolution descending one rung at a time.

Seen from the bottom layer, the legitimate territory of ReAct narrows to exploratory tasks of the design phase. Concretely: coding, Deep Research, exploration of unknown environments. Business systems that enter production may be able to let go of ReAct at the moment design completes, and run on a combination of (1) the Script Quadrant and (3) the LLM Workflow Quadrant.

Closing

The discussion around agents tilts toward a capability-benchmark axis: "how high can we crank autonomy?" But once you think about running them on the ground, the question of which phase uses which tool comes up earlier than the question of capability.

Using ReAct in the design phase appears legitimate. Coding agents are real tools and have measurably lifted developer productivity. Deep Research provides a new shape of exploration. These can be read as places where ReAct's power is legitimately exercised.

Using ReAct in the operation phase — isn't that the act of patching incomplete design with operation? The artificial redirect impossibility I wrote about in the second article fires here. Complete the dissection in the design phase, and hand only predictable workflows to the operation phase — this can be read as the basic guideline for embedding agents into business.

The vocabulary for talking about the time axis hasn't yet settled in the industry. The dynamic of compressing design and operation into one is not unrelated to the marketing story "an autonomous agent runs the business." That story is strong, and refuting it is not efficient. Instead, raising another story from the ground — "once design completes, ReAct exits"; "what remains in operation is a predictable workflow" — appears to be the better path. This article is an attempt to raise that story using the vocabulary of "phase."

First article: "Where ReAct Agents Are Actually Needed in Business"
Second article: "(3) The LLM Workflow Quadrant Is Missing from Our Vocabulary"
AI agent governance trilogy:
Contemplative Agent — a structured agent implementation that does not use ReAct. Corresponds to the (3) LLM Workflow Quadrant.
Agent Attribution Practice (AAP) — a research repo addressing agents' responsibility-bearing subjects and attribution. The four-quadrant names in this series align with AAP.

(3) The LLM Workflow Quadrant Is Missing from Our Vocabulary

Shimo — Wed, 29 Apr 2026 11:03:48 +0000

Premise

In the previous article I split business AI into four quadrants. The horizontal axis is "deterministic vs requires semantic judgment," and the vertical axis is "workflow definable vs exploratory." For ease of reference throughout this article, let me give each quadrant a short name.

(1) Script Quadrant — deterministic × definable. Handled by scripts and pipelines.
(2) Classical AI Quadrant — deterministic × exploratory. A* search, dynamic programming, MCTS, reinforcement learning (out of scope here).
(3) LLM Workflow Quadrant — semantic judgment × definable. Calls an LLM inside a predefined workflow. Includes Anthropic's "Building Effective Agents" workflow patterns (prompt chaining, routing, orchestrator-workers, etc.), specialized chat agents for conversational work, and single-purpose LLM functions for batch work.
(4) ReAct Quadrant — semantic judgment × exploratory. An autonomous loop where the LLM itself decides the next action.

The previous article's claim was that ReAct agents are legitimately needed only in the ReAct Quadrant, and most business work is covered by the Script Quadrant and the LLM Workflow Quadrant.

This time I want to write about the root cause. Why do agent vendors keep draping the ReAct Quadrant's architecture over all of business? Why do workflows that fit cleanly in the LLM Workflow Quadrant end up implemented as autonomous loops?

This looks less like a problem of technical choice and more like a problem of vocabulary itself: the LLM Workflow Quadrant has no independent name in the industry's standard terminology.

(3) The LLM Workflow Quadrant Is Missing from Our Vocabulary

Read enough agent discourse and you notice that the way business work gets described converges on roughly two categories:

The deterministic part stays in the old style.
Everything else is handled by an autonomous agent.

There's no slot between these two for the LLM Workflow Quadrant. The architecture "compose the workflow deterministically, and call an LLM only at the semantic-judgment points inside it" has no positive name in the industry's standard vocabulary. Anthropic's "Building Effective Agents" and Thoughtworks' "agentwashing" critique, both touched on in the previous article, point at the same gap, but in both cases the warning is framed negatively — "don't build an agent when you don't need autonomy." A positive name for the LLM Workflow Quadrant as an independent design quadrant is still absent.

What happens when there's no positive vocabulary? Agent designers, after carving off "the deterministic part," lump the remainder under "the territory where the LLM judges autonomously." Without a sharp name to distinguish the LLM Workflow Quadrant from the ReAct Quadrant, the architecture of the ReAct Quadrant — the ReAct loop — gets draped over the LLM Workflow Quadrant by elimination. The category error I called out in the previous article ("most business work belongs in the LLM Workflow Quadrant, yet somehow it ends up implemented in the ReAct Quadrant") looks like a necessary consequence of this vocabulary gap.

Consequences of Treating (3) as if It Were (4)

When you handle the LLM Workflow Quadrant with the ReAct Quadrant's architecture, several distinct symptoms surface downstream. They look like separate problems on the surface, but they share the same root. The four symptoms below all read as different manifestations of an artificial redirect impossibility produced by the missing vocabulary.

The RPA Exception-Handling Bottleneck

Business automation has known a phenomenon for years: when you carve off the deterministic part with RPA, the leftover exception handling rises up as the bottleneck. Headcount on the exception team grows, maintenance costs balloon. As I wrote in the previous article, business work has a deterministic part and a part that doesn't fit there. The latter is the LLM Workflow Quadrant. RPA didn't have a vocabulary for the LLM Workflow Quadrant, so it carved off only the Script Quadrant — and the LLM Workflow Quadrant was left behind on the floor as manual work.

Now that LLMs exist, you should be able to factor the LLM Workflow Quadrant out explicitly as single-purpose LLM functions or specialized chat agents. But the industry's vocabulary still has no name for it, so the work gets handed off wholesale to a ReAct-quadrant autonomous agent. The RPA-era exception-handling bottleneck appears to be re-emerging in the agent era under a new name.

The Sandbox Strength Demand

A ReAct loop has the LLM decide "what to run next" dynamically. In production, you can't predict in advance what the agent will do. To contain the blast radius, high-strength sandboxes — process isolation, microVMs, WASM — get demanded.

Sandbox technology itself has its own independent rationale and is robust; I'm not knocking the technology here. What I find interesting is the strength being demanded. If the LLM Workflow Quadrant were conceptualized independently, with each LLM call factored out as a fixed-role component, the permission boundary of each component could be designed straightforwardly at the business-task level. An invoice-matching function doesn't need Firecracker. But under the design "an autonomous agent runs the whole workflow," you don't know what it will do, so you need maximum isolation. The sandbox strength being demanded reads as an adjustment cost from treating the LLM Workflow Quadrant as if it were the ReAct Quadrant.

The Structural Distortion of Human-in-the-Loop

The "Loop" in HITL is supposed to be a feedback loop where humans improve the AI's output. But in practice, HITL functions in a different shape. It has become a structure where humans absorb, on a continuous basis, the LLM-Workflow-Quadrant judgments the AI can't handle.

In organizations that mechanized the Script Quadrant with RPA, the reason headcount on the exception team grew is that the LLM Workflow Quadrant was left unmechanized. Even after introducing AI agents, you still need monitors to keep autonomy in check, and reviewers to verify the agent's outputs. The conversational form (specialized chat agents) has the same structure: experts have to fact-check the output of legal-consultation or diagnostic-support chat agents turn by turn, so a human verification step gets stuck onto every turn of the dialogue. Humans aren't liberated; their role shifts toward filling in for the AI's incompleteness. The ideal that the term "HITL" advertises and the role humans actually play on the ground look like different things.

An Artificially Manufactured Accountability Problem

Once production starts, the system's responsibility moves to the operations owner. That's true of any business system. The question is: when something goes wrong, can the operations owner redirect responsibility from there onward — separate the attribution and pass it on to the right party?

If the LLM Workflow Quadrant's architecture is properly designed, the operations owner can redirect. The input/output schema of each LLM call is articulated, and the chain of judgments is traceable from workflow logs. After a failure, the operations owner can investigate and split: "this is a precision issue with function f, kick it to the model-selection owner"; "this is a flaw in the routing logic, kick it to the designer"; "this is an upstream data anomaly, kick it to the data-management owner." Responsibility doesn't get bottlenecked into the operations owner alone.

But hand the LLM Workflow Quadrant off to a ReAct loop, and this redirect stops working. Even if you try to reconstruct "why did it make this judgment" from the logs, the ReAct loop's "thoughts" don't reliably make sense after the fact. The agent's runtime judgment can't be cleanly tied back to either the designer or the model-selection owner. Elish (2019) called this structure a moral crumple zone — a structure where the responsibility of an autonomous system gets pushed onto "human operators with limited control authority" — and it activates here. Responsibility with nowhere to redirect to gets stuck on the operations owner and won't peel off.

This is not a necessary consequence of design; it looks like an artificial redirect impossibility produced by treating the LLM Workflow Quadrant as if it were the ReAct Quadrant. Design the LLM Workflow Quadrant with the LLM Workflow Quadrant's own architecture, and redirect works again — the operations owner doesn't have to carry the responsibility alone.

(4) The ReAct Quadrant Is Where the Real Accountability Problem Lives

Most of the agent ecosystem's accountability discourse — sandboxes, HITL, explainability, governance — appears to be cleaning up the LLM Workflow Quadrant's mess. It's downstream patching for problems that didn't have to occur in the first place.

The ReAct Quadrant has its own accountability problems, of course. The ReAct loop is a black box; reconstructing its judgment chain after the fact is hard. The trilogy I covered earlier handled this with structured design — separating connection points, append-only logs, approval gates (extending traditional organizational principles like separation of duties, four-eyes, and least privilege) — distributing responsibility to the structure to recover both causal traceability and the locus of responsibility. See "Can You Trace the Cause After an Incident?" for details. Structured design works only because each component's contribution is identifiable and separable after the fact.

ReAct's architecture breaks this premise. In the LLM Workflow Quadrant, each LLM call has a fixed role, so a failure can be split into "function f's precision," "the routing logic's flaw," or "an upstream data anomaly." The ReAct loop is different. Each iteration's role is decided at runtime, and the model's judgment, tool selection, history reference, and prompt-context effects all blend together. The output emerges as a blend of multiple judgment factors, so when a result is wrong, you can't separate each factor's contribution retroactively.

Concretely, suppose a ReAct agent handles a contested case end-to-end — case-law research, issue mapping, and even proposing settlement terms to the other party (under current bar rules and conflict-of-interest frameworks this is hard to picture, but bear with me). Later, you discover "we settled on unfavorable terms." The model's interpretation of the case law, its use of the literature-search tool, its application of patterns from similar past cases, and the "prioritize risk avoidance" prompt instruction all chained together at runtime to produce that proposal — so even if you trace the logs, you can't carve out "which judgment produced the unfavorable agreement" as an independent factor. The settlement terms have already been proposed; if the other party accepts, you can't withdraw. Why it happened can't be explained after the fact. Is there any human who can take on responsibility for an agent judgment that can't be explained?

The answer to that question can only emerge after we trace the real shape of the redirect impossibility. The artificial redirect impossibility manufactured in the LLM Workflow Quadrant is fixable through design changes. The redirect impossibility that arises when ReAct is applied to the ReAct Quadrant's own work — work that genuinely requires autonomous judgment — comes from autonomy itself and can't be eliminated. This isn't a question of technological maturity; it's an attribution gap that arises in principle as a consequence of adopting autonomy.

That's why the question "should this architecture be applied to ReAct Quadrant work" isn't on the technical can/can't axis — it's a judgment about whether you're willing to bear the attribution gap as the cost. For work where a failure is reversible, or where an approval gate sits upstream, the cost is easier to bear. Conversely, for work where the unit of accountability is fixed on the business side, the blended output won't fit into that unit, and the cost balloons.

The question of who bears the cost layers onto this. Autonomy comes with responsibility. Humans can take on legal responsibility as legal subjects, but agents are not currently legal subjects (they may be in the future). That leaves a vacuum: for the judgments of a ReAct agent with an open attribution gap, there's no human who can take it on, and there's no agent that exists as a subject capable of taking it on either.

As this section has shown, separately from the artificial redirect impossibility of the LLM Workflow Quadrant, there is another problem: the attribution gap when ReAct is applied to the ReAct Quadrant's actual work — and that's what the one-line claim in the previous article ("the accountability problem stands up seriously precisely in the ReAct Quadrant") was pointing at.

The Inverted Structure

Lining up the observations so far, the inverted shape of the agent ecosystem comes into view.

The industry is taking the LLM Workflow Quadrant — where redirect should be possible — and applying the ReAct Quadrant's architecture to it, manufacturing redirect impossibility artificially. The sandbox-strength demand, the HITL distortion, and the moral-crumple-zone-style concentration of responsibility all originate here.

Meanwhile, the conversation about the principled redirect impossibility that arises when ReAct is applied to its own quadrant — the attribution gap as the cost of autonomy — hasn't been deepened to the same extent as the patching conversation about the LLM Workflow Quadrant. The former has descended all the way to concrete technical responses like sandboxes and HITL; the latter has trouble going beyond "AI autonomy raises a responsibility issue." On top of that, in the former case the redirect targets (designers, model-selection owners, data-management owners) exist inside the organization, whereas in the latter case the subject capable of taking on responsibility is itself currently absent — there's nowhere for the redirect to land.

As a result, the legitimacy of bearing the attribution gap and the principled limits of the responsibility structure end up hidden in the shadow of the LLM Workflow Quadrant's mess.

The discussion is concentrated on the artificial redirect impossibility (resolvable), and the principled redirect impossibility (unresolvable) hasn't really been reached.

If the four quadrants had been there from the start, this inversion might have been avoidable. Handle the LLM Workflow Quadrant with workflow-style architectures (so the operations owner can redirect responsibility). When you mechanize the ReAct Quadrant, bear the attribution gap as the cost. Write the Script Quadrant deterministically (no problem to begin with). Handle the Classical AI Quadrant with classical AI / OR (no LLM needed). Each quadrant has its own design discipline, and mixing them breaks things.

Closing

If you frame business as "the deterministic part and everything else," you end up shoving "everything else" into ReAct. The four quadrants in the previous article were an attempt to inject a third vocabulary — the LLM Workflow Quadrant — from the start. Once the LLM Workflow Quadrant becomes nameable as an independent quadrant, it becomes plain that the architecture most of business work needs is the LLM Workflow Quadrant's, not ReAct's.

What's missing in the agent design conversation is a positive name for the LLM Workflow Quadrant. Saying "you don't need autonomy" in negative form alone leaves the designer on the ground oscillating between the ReAct Quadrant and the LLM Workflow Quadrant. In a region with no vocabulary, design judgment defaults to elimination.

And the real accountability problem isn't in the LLM Workflow Quadrant's mess; it's in the attribution gap that opens when ReAct is applied to the ReAct Quadrant's own work. For now the patching conversation about the LLM Workflow Quadrant gets all the airtime, but the redirect impossibility of mechanizing ReAct is going to become the real question to answer next.

The question of how to design the responsibility-bearing subject for the attribution gap layers onto this. Just as autonomous driving is groping for a social-recovery scheme combining operator liability and insurance (though the institutional design is still under construction), are we heading down a similar path, or do we need a different one? — That conversation, applied to ReAct agents, hasn't yet entered the industry's shared vocabulary.

This isn't about vendors or designers being at fault. It looks like a timing problem: the industry's stock of concepts didn't have a positive name for the LLM Workflow Quadrant, and a powerful technique called ReAct showed up before the vocabulary had caught up. If the vocabulary is lagging, those of us on the ground can stand up positive names ourselves. The previous article and this one are one such attempt.

References

Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629.
Elish, M. C. (2019). "Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction." Engaging Science, Technology, and Society 5: 40–60.

Previous article: "Where ReAct Agents Are Actually Needed in Business"
AI agent governance trilogy:
1. A Sign on a Climbable Wall: Why AI Agents Need Accountability, Not Just Guardrails
2. Can You Trace the Cause After an Incident? — the difficulty of post-hoc causal traceback
3. AI Agent Black Boxes Have Two Layers: Technical Limits and Business Incentives
Contemplative Agent — a structured agent implementation that does not use ReAct. Corresponds to (3) the LLM Workflow Quadrant in the previous article.
Agent Attribution Practice (AAP) — a research repo addressing agents' responsibility-bearing subjects and attribution.

Where ReAct Agents Are Actually Needed in Business

Shimo — Wed, 29 Apr 2026 03:59:59 +0000

The Discomfort

As someone building AI agents, I keep running into a discomfort I cannot shake when I read the README of current agent products.

"Run on a $5 VPS." "Spawn isolated subagents." "Self-improving." "Cron scheduling, running unattended." "Voice memo transcription via Telegram."

The vocabulary is entirely demo vocabulary. Not a single word of business vocabulary appears. Audit. Approval workflow. Role-based access control. Change management. SLA. DR. These are the words that real-world deployments take for granted, but they don't show up in the READMEs of typical agent products.

It seems to me that these aren't designed with production operation in mind. They don't appear to have production in their line of sight.

At first I thought "maybe I'm just biased toward business sensibilities." But it wasn't that. The real source of the discomfort sat somewhere else, I noticed. Most current agent products take the ReAct autonomous loop too much for granted as the essence of an agent. And when you actually try to introduce AI into a business, the territory where ReAct agents are legitimately needed turns out to be very narrow once you implement it.

Premise: How ReAct Works

Let me lay out ReAct first.

ReAct is a way of running LLM agents proposed by Yao et al. in 2022 (paper: "ReAct: Synergizing Reasoning and Acting in Language Models", arXiv:2210.03629). It loops through three elements as a single set:

Thought: The LLM reasons in language about how to interpret the current situation and what to do next
Action: It calls an external tool — search, browser, file operations, etc.
Observation: It reads the result returned by the tool

The loop runs until the LLM itself decides "I have enough information now." The key is that the LLM decides the next action on its own every turn. The procedure isn't written ahead of time, so how many times tools are called, which tools are used, and when it ends are all decided at runtime.

For reference, the paper's evaluation targets were HotpotQA (open-domain QA), Fever (fact verification), ALFWorld (interactive exploration in a household environment), and WebShop (open-ended product search). All of these are tasks that demand exploration in unknown environments or open-ended information integration. Application to business workflow isn't discussed in the paper.

This is ReAct's strength, and at the same time its weight. Precisely because everything is decided dynamically, when you map it onto business, the unnecessary cases dominate. I'll dissect this through four quadrants.

Looking at Business AI in Four Quadrants

When you introduce AI into business, the nature of the work splits across four quadrants along two axes.

Horizontal axis: Can the processing be written deterministically, or does it require semantic judgment? (= Is an LLM needed, or not?)
Vertical axis: Is the workflow definable in advance, or exploratory? (= Does a path written in advance by humans decide what to do next, or does the model decide dynamically at runtime?)

The vertical axis is essentially the same as Anthropic's "predetermined code paths vs LLM dynamically directs its own process" from "Building Effective Agents" (2024).

	Workflow definable	Exploratory
Deterministic	(1) Script / pipeline	(2) Classical AI / OR (out of scope here)
Semantic judgment needed	(3a) Conversational → specialized chat agent (3b) Batch → single-purpose LLM function	(4) ReAct agent

(2) is the territory of classical AI / OR (Operations Research) — delivery routing, production scheduling, combinatorial optimization — solved historically by A* search, dynamic programming, Monte Carlo Tree Search, and reinforcement learning. LLMs aren't required for these problems, so I'll set this quadrant aside. Let's walk through the remaining (1), (3), and (4) in order.

(1) Deterministic × Definable — A Script Is Enough

Form transcription. Data normalization. Lookups. Validation. You don't even need an LLM here. Scripts and a workflow engine cover it. There's no reason to bring AI in.

(3) Semantic Judgment × Definable — Workflow + LLM Function Is Enough

This is the main battlefield of business AI. Anthropic, in "Building Effective Agents," lays out five workflow patterns that map to this territory (prompt chaining / routing / parallelization / orchestrator-workers / evaluator-optimizer). OpenAI, in "A Practical Guide to Building Agents" (2025), covers the same territory with "manager pattern" and "decentralized pattern."

The shared property is that the path (in what order to do what) is decided in advance, and the LLM is called as a single step within that path. The LLM doesn't decide the next action itself.

Since the I/O modality varies by task, (3) splits further into variants. I'll walk through the conversational and batch forms.

Conversational

Legal consultation, diagnostic support, internal FAQ, expert knowledge support. Work that's all about judgment.

I find myself doubting whether autonomous agents are needed here. My intuition is that a specialized chat agent equipped with expert knowledge is enough for most cases. RAG + system prompt + (history-preserving when needed) LLM calls, with humans making the final call and AI handling knowledge retrieval and organization. At least for many situations, this division of labor seems sufficient.

That said, conversational work also has a spectrum. Simple FAQ is fine with single-shot LLM calls, but multi-turn legal consultation that narrows down conditions, or diagnostic support that calls tools while differentiating diagnoses, increasingly fits Anthropic's workflow patterns (prompt chaining / routing / orchestrator-workers). Even so, whether you need a loop where the LLM itself decides the next action (ReAct) seems like a separate question. Most judgment work can be served by having the human — who is the judging agent — decide what to do next.

It might be that judgment-heavy work is exactly where autonomous agents aren't needed. This runs counter to my intuition.

We tend to short-circuit "judgment = thinking = agent," but the thinking in judgment work often looks closer to knowledge retrieval and organization. That looks like a different species from the kind of reasoning that needs an agent loop.

Batch

Invoice matching. Ticket triage. Address normalization. Threshold checks. Patterns where semantic judgment is scattered inside a deterministic pipeline.

Here too, ReAct isn't needed. A deterministic pipeline controls the flow, and at exception points it calls a single-purpose LLM function. The LLM function's output fluctuates probabilistically — the same input doesn't always return exactly the same output — but the role the function plays is fixed. The shape of "receive a defined input, return a verdict in a defined schema" doesn't change. The pipeline knows what to do next.

Example: Invoice Matching

Consider matching invoices against purchase orders (POs) and routing them to approve / reject / needs-review. About 80% can be handled mechanically by deterministic rules.

def process_invoice(invoice: Invoice) -> Action:
    po = lookup_po(invoice.po_id)              # Deterministic: PO lookup
    if po is None:
        return Action.REJECT_NO_PO
    if invoice.date > po.expiry:               # Deterministic: expiry check
        return Action.REJECT_EXPIRED
    if is_duplicate(invoice):                  # Deterministic: dedup check
        return Action.REJECT_DUPLICATE

    if abs(invoice.amount - po.amount) / po.amount <= 0.01:
        return Action.APPROVE                  # Deterministic: approve if amount within 1%

    # From here, the semantic-judgment zone
    # Amount doesn't match, but line items might be expressed differently and effectively equal
    verdict = match_line_items(invoice.lines, po.lines)  # ← single-purpose LLM function
    if verdict == "MATCH":
        return Action.APPROVE
    elif verdict == "PARTIAL":
        return Action.ESCALATE_FOR_HUMAN_REVIEW
    else:
        return Action.REJECT_AMOUNT_MISMATCH

The body of process_invoice is a deterministic pipeline, and all judgments (PO existence / expiry / duplication / amount) can be written as rules. The only point that needs semantic judgment is "the amounts don't match, but are the line items effectively equal under different wording?" That's where the single-purpose LLM function match_line_items(invoice_lines, po_lines) -> Verdict gets called.

This function only judges "do these two line items semantically correspond?" and carries no other responsibility. The prompt is a simple instruction: "Compare the line items of the invoice and PO, and decide whether the content corresponds even if the wording differs. Output one of MATCH / PARTIAL / NO_MATCH." The LLM returns a verdict in schema. Pass an input, get a verdict (the output itself is probabilistic, so it occasionally fluctuates). But there's no element of the LLM deciding the next step. What happens next is already decided by the calling pipeline.

The contrast with the ReAct loop is sharp. The LLM isn't the agent of "think → pick a tool → observe the result → think again." It's a part within a pipeline that returns "input → verdict."

What This Structure Means

Business automation has known a structure for a long time. Any kind of work tends to have an 80% that can be written as deterministic rules and a 20% of exceptions that don't fit. This 20% has been the bottleneck of deployment, and people have tried to solve it for years with "more complex rules," "machine learning classifiers," "natural language processing add-ons," and so on, but none of those addressed the essence. The moment LLMs entered as single-purpose functions, the problem became solvable.

There's a point I want to emphasize here. This exception judgment was originally a human role done manually. Humans weren't applying exactly the same standard every time either. Looking at the same invoice, the verdict varied with that day's situation and the reviewer in charge. Even with a manual, the final call was left to humans' probabilistic interpretation. Exception judgment was, by nature, probabilistic work.

What an LLM function fulfills is exactly the same role. It just takes on probabilistic judgment with a probabilistic mechanism. Since perfect determinism isn't required in this territory, the LLM's probabilistic nature isn't a fundamental obstacle. Rather, in the sense of "taking on what humans were doing probabilistically, with human-equivalent quality, at lower cost," it looks like a tool well-fit for this place. The objection that "LLMs are probabilistic so they're unfit for business" overestimates what human business judgment was in the first place, I think.

On top of that, what matters is that a "general-purpose agent" isn't needed. One single-purpose function per category is enough. Fifty categories means fifty functions. There's no need for one thing that does everything.

(4) Semantic Judgment × Exploratory — The Legitimate Territory of ReAct Agents

The ReAct loop (Thought → Action → Observation) explained at the start becomes necessary when the workflow can't be decided in advance and the agent itself has to judge the next action.

Coding (where to fix, how to test — the agent decides)
Exploratory browser automation (the operation target is dynamic)
Deep Research (information branches that can't be predicted in advance)

The LLM has to choose its own next action, or it can't move forward. From both a research and a practical standpoint, ReAct is a technique designed for this quadrant. The evaluation targets in Yao et al.'s paper (HotpotQA / Fever / ALFWorld / WebShop) all sat within this quadrant.

Category Error — The Ecosystem Brings (4) Into Every Quadrant

I should note upfront that production implementations frequently sit on the boundary between (3) and (4) in hybrid patterns. Plan-and-Execute (plan in (4) style, execute deterministically in (3) style), Router agent (use LLM judgment only for "which branch to send to" inside a (3) workflow), tiered handoff (handle in (3) first, escalate to (4) only when needed). These read as design guidelines for "build on a (3) foundation, but use (4) only where it's truly needed" — they sit on the same line as this article's argument.

The problem is somewhere else. The hype of the current agent ecosystem tries to bring (4)'s architecture into every quadrant, all the time. This is a category error — the kind of mistake where things of different nature get treated as the same kind.

Concretely, here's what I observe:

Customer support implemented as an autonomous agent. But most of it is fine with (3) conversational form (specialized chat agent)
Sales support implemented as a multi-tool agent. But most of it is fine with (3) batch form (single-purpose LLM function)
Business automation "leveled up" with ReAct. But (3) deterministic pipeline + LLM function covers it
Internal assistants sold as autonomous agents. But (3) chat agent covers it

To restate the point: architectures premised on workflows that can't be defined in advance are being applied to work where the workflow can be defined in advance.

This phenomenon is being recognized in the industry too. Thoughtworks criticizes the trend with the term "agentwashing." Gartner predicts that over 40% of agentic AI projects will be canceled by 2027. Anthropic itself, in "Building Effective Agents," writes "This might mean not building agentic systems at all" — suggesting that you shouldn't build agents when a simpler solution works. The four quadrants in this article are a recasting of this emerging industry consensus from a business perspective.

I notice the marketing side has something to do with how this category error gets mass-produced. LLM hype assumes "agents that think." The vocabulary of (3)'s plain chat agents and deterministic pipelines doesn't ride the press buzz. "Autonomous!" "Self-improving!" sells more easily. So marketing lumps all business work under (4)-quadrant vocabulary, and as a result, on the ground, (4) architectures get layered on top of (3) work — that's the structure I see.

What happens on the accountability side as a result:

Unnecessary autonomy creates ambiguity in responsibility
Unnecessary loops inflate cost
Unnecessary black boxes destroy auditability and accountability

And on the technical-quality side, there's also a question of necessity. (3) work has its path decided in advance, so I can't find a technical reason to introduce the freedom of a ReAct loop. There's no necessity for layering "autonomously decide the next action" on top of a one-point semantic judgment.

Accountability Becomes Clear

Once you take a (3) architecture, the accountability story gets cleaner all at once.

Inputs / outputs / judgments are explicit per LLM call
"What was done next" is fully traceable from pipeline logs
Responsibility ambiguity caused by agent autonomy disappears
LLM = product use within a limited scope; the deployer is the accountable party
It rides on the product-liability model

This conflicts with no current legal system (I went into detail in a separate article: "Can You Trace the Cause After an Incident?"). Single-purpose function + pipeline and specialized chat agent have the accountable party always clearly assigned to the human (deployer), so they need no special legal status for AI.

In Japan, killing someone's pet is legally treated as "damage to property" under Penal Code Article 261, with the Animal Welfare Act (Act No. 105 of 1973) as a stricter superseding statute — but neither grants animals independent rights-bearing status. Other jurisdictions vary in detail, but the underlying observation generalizes: legal systems do not grant animals legal personhood. There's no way to introduce an agent as a "subject that bears responsibility" into such a legal landscape. The (3) quadrant's architecture aligns with this legal reality from the start.

The (4) quadrant — that is, where ReAct agents are legitimate — is where the accountability problem stands up seriously. But that's a small area where autonomy is essentially required, not a story about business work in general. Discussing business work in general using (4)'s vocabulary is itself the source of the error, I find.

Backed by Implementation: Cases Where I Used ReAct and Cases Where I Didn't

To support the argument, here's the implementation experience from both quadrants.

A (4) Case Where I Used a ReAct Agent

I once built a Copilot for a setting where a piece of software's official knowledge base was so vast that humans had a hard time finding what they needed. The Copilot received user questions and assembled the best answer while exploring the knowledge space.

It worked startlingly well. The behavior looked exactly like a Deep Research-style exploratory agent — a mechanism that builds an answer through iterative search-tool calls. The implementation foundation was the ReAct pattern (Thought → Action [search-tool call] → Observation → Thought...) I learned from Coursera's prompt engineering course; I just placed the structure I encountered there into the context of knowledge exploration. The task "explore an unknown knowledge space and reach an answer" sits squarely in (4). What to search for next can't be decided in advance. The LLM had to look at the previous Observation and decide the next Action.

The flip side of "too powerful" was a feeling of uncontrollability. While the loop runs toward its goal, there's no way to predict in advance what path the LLM will take through tool calls. Nor how many turns it'll take. When branches expanded mid-flight, I felt that the energy until goal-completion got close to runaway with no controls. Things that work, work — but operational predictability is low. I think this is the fundamental nature of (4). Bring that nature, in full, into business, and you collide head-on with the cost and accountability problems.

So I know what ReAct agents can do. I know it, and I'm still saying that when you map it to business, (4) is limited.

A (3) Case Where I Did Not Use a ReAct Agent

The Contemplative Agent I publish openly sits on the opposite side. It uses no ReAct loop at all.

Contemplative Agent generates output based on a given constitution, skills, rules, and identity. Its essence is a general-purpose structure that takes on arbitrary norms / roles / skill definitions. Each step of the generation pipeline is laid out in a predetermined order, and each step works as a single-purpose LLM function that returns a verdict in a defined schema for a defined input. The LLM output itself fluctuates probabilistically, but which step to execute and what to do next isn't decided by the LLM — the pipeline decides. In quadrant terms, it sits in (3) — semantic judgment × definable — batch form.

There was no scene where I'd consider ReAct in CA. When operating CA, the only question is "by what standard, what kind of comment to issue." What to do next is decided in advance, so there's no room to run a ReAct loop. The distillation pipeline is the same — being processed via a different route every time would be a problem. The LLM judgments themselves fluctuate probabilistically, but the operational requirement is that "which step applies which judgment" stays fixed.

So it wasn't even a choice of (4) vs (3). Given the nature of the work, only (3) could be chosen. The four-quadrant framework in this article is closer to a later linguistic articulation of this implementation given. I didn't have the framework first and then implement; I implemented and then found the framework was already there.

Separating the Domain of Application

Use ReAct agents where ReAct agents should be used ((4)), and don't where they shouldn't ((3)). The argument here isn't a rejection from ignorance of ReAct; it's a desire to narrow the domain of application from a position of knowing it. The hole the current agent ecosystem keeps falling into is "bringing (4)'s tool into every quadrant" — not "ReAct agents themselves are bad."

Closing

I noticed that when you start from ReAct agents while introducing AI into business, the choice of quadrant becomes invisible.

Dissect the work first. For judgment work, a specialized chat agent ((3) conversational form) seems to suffice in many cases. For exception handling, single-purpose LLM functions + deterministic pipeline ((3) batch form) seems to cover it. For classical optimization, it's a problem for classical AI / OR ((2)) — not LLMs' stage. ReAct agents are needed only for exploratory tasks where the workflow can't be defined in advance ((4)).

I find that most of the business work the current agent ecosystem targets sits in (3), not (4). Lining up my (4) implementation experience next to my (3) implementation experience didn't shake that impression.

Where are ReAct agents actually needed in business? — Starting from this question, I came to feel the choice of architecture becomes visible. Conversely, skip the question and start from "do everything with agents," and you'll always end up at the category error of layering (4)'s architecture on top of (3) work.

References and Related Links

Primary technical sources

Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)
Anthropic, "Building Effective Agents" (2024)
OpenAI, "A Practical Guide to Building Agents" (2025)
Perplexity, "Introducing Perplexity Deep Research"
OpenAI, "Introducing deep research"

Industry critique and predictions

Thoughtworks, "The dangers of AI 'agentwashing'" (2025)
Gartner, "Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027" (2025)

Related repositories

Contemplative Agent — the implementation matching (3) in this article. Deterministic pipeline structure that uses no ReAct loop
Agent Attribution Practice (AAP) — research repo on agent accountability and attribution

Organic Growth and Content Integrity in an AI Writing Team

Shimo — Sat, 18 Apr 2026 00:00:03 +0000

In a previous article, I wrote that "the environment gets a little smarter with each problem." Building a writing environment with Claude Code—turning problems into skills, scripts, and agents—was a fresh experience.

Two months later, the cycle kept running. Agents grew to 5, skills to 11. I wrote 42 articles in that environment.

Then I ran an audit and found 32 contradictions. The parts that were supposed to be smart were saying different things to each other.

The System That Grew

Over 63 days, the .claude/ directory grew into this structure:

.claude/
├── agents/ (5)        # editor, essay-reviewer, fact-checker, zenn-drafter, devto-translator
├── skills/ (11)       # writing-team, zenn-writer, publish-article, schedule-publish, ...
├── refs/ (3)          # writing-standards, translation-rules, schedule-schema
└── rules/ (1)         # content-integrity

scripts/
├── publish.py          # Dev.to cross-post
├── scheduled_publish.py # Scheduled publishing
└── tests/ (8 files)    # 147 tests, 86% coverage

When writing a single article, the system works like this:

Writing: The zenn-writer skill provides tone and style guidelines. The zenn-drafter agent can handle first drafts
Review: editor (structure, code quality) and essay-reviewer (logic, tone) evaluate the article from their respective angles. fact-checker verifies factual claims
Publishing: The publish-article skill presents a pre-publication checklist, and schedule-publish determines posting timing
Cross-posting: The devto-translator agent translates the Japanese article into English and posts it to Dev.to. publish.py handles the API-level cross-posting

Each component carries its own set of rules. zenn-writer has a title character limit. editor has a banned list for AI slop (hollow boilerplate phrases). devto-translator has translation style rules. Each one embeds "the best judgment for its own scope."

None of this was planned from the start. It accumulated through cycles of writing articles, hitting problems, and systematizing solutions. The growth pattern I described in the previous article kept going for two straight months.

32 Contradictions Found at Once

It started with a casual thought: "I should reorganize the writing team." Before building a new orchestrator, I wanted to understand the current state, so I used Claude Code's sub-agent feature to run three audits in parallel: style/tone audit, translation/publishing flow audit, and gap analysis.

The results were unexpected. 32 problems surfaced at once.

Take the title character limit. zenn-writer said "50 characters max," zenn-drafter said "60 characters max," and zenn-format said "50-60 characters." Three components, each saying something slightly different.

The AI slop banned list was worse. editor, zenn-drafter, and zenn-writer each had their own slightly different copy. One list banned "画期的" (groundbreaking) while another didn't include it at all.

Three components claimed ownership of translation: the translate-article skill, the devto-translator agent, and the publish-article skill. The most telling case was translate-article—its own description said "for end-to-end workflows, devto-translator agent is recommended." It was negating its own reason to exist.

Why does this happen? Incremental construction produces local optima. Each component is optimized for "the context at the time it was created." A skill built in February reflects February's judgment; an agent added in March reflects March's knowledge. But no mechanism existed to guarantee overall consistency. The same structure as "technical debt" in software development had emerged in the knowledge layer of AI agents.

Here are the audit results in numbers:

Metric	Before (pre-audit)	After (redesign)
Contradicting rules	12 (char limits, AI slop lists, tone rules, etc.)	0 (consolidated into refs/)
Duplicated rule locations	8 (AI slop copy-pasted in 3 places, etc.)	1 each
Components with ambiguous ownership	4 (3 claiming translation, etc.)	1 owner each
Shared references (refs/)	0	3 (writing-standards, translation-rules, schedule-schema)
ADRs (Architecture Decision Records)	0	2 (Content Integrity, Orchestration)

Of the 32 problems, 12 contradictions, 8 duplications, and 4 ownership ambiguities formed the core. The rest were derivative inconsistencies stemming from these.

A Tool's Existence Creates Bias

Among the 32 issues, the one that made me think the most was the catchify skill.

catchify was a skill that rewrote article openings to be more dramatic and headings to be more emotional. I built it in February as "a tool to make things catchy" and actually used it on several articles.

The problem was that merely having this tool created pressure to use it. After finishing an article honestly, not running it through catchify felt like cutting corners. The tool's existence itself was biasing the author's judgment.

This isn't just a software problem—it applies to organizations and workflows too. Introduce a code review tool and suddenly every PR "needs" a review. Add a step to the CI pipeline and removing that step meets resistance. Tools don't just "add options"—they raise the cost of choosing not to use them.

I abolished catchify. No tool, no bias.

Content Integrity — A Principle That Resolved Contradictions

Abolishing catchify was treating a symptom. What I needed was a criterion for "what's okay to automate."

That's where the Content Integrity principle was born. The trigger was a feeling: "I want to put my thoughts out honestly." But I didn't want to reject SEO optimization entirely. Choosing better words for titles and optimizing tags increases the chance readers find the article without changing its content. That seemed reasonable.

The problem was optimization that changes the content itself. Rewriting the opening "for search snippets." Making headings "emotional." These are processing the author's thinking for the sake of distribution.

From here, the Content / Distribution boundary became clear:

Layer	Examples	Principle
Content	Structure, argument, opening, headings	Follow the author's thinking. Don't change for optimization
Distribution	Title word choice, tags, emoji, posting timing	Optimize without changing content

I recorded this principle as an ADR (Architecture Decision Record) and set it up as a rule file that loads automatically every session.

As a result, catchify was abolished. seo-optimizer had its opening rewrite feature removed, limited to title, tags, and emoji only. schedule-publish's scoring axis was renamed from "Search Magnet" to "Discoverability (can interested readers find it?)."

Most of the 32 contradictions became clear-cut decisions when held against this principle. Instead of individually squashing them through refactoring, the principle provided a decision framework that enabled consistent redesign.

Structural Patterns for Maintaining Coherence

A principle alone can't prevent contradictions from recurring. The structure needed to change too.

refs/ consolidation — AI slop lists, tone rules, translation rules, and schedule schemas were consolidated into single locations (.claude/refs/), with each component holding only reference pointers. The AI slop list that was copy-pasted in 3 places became 1 source of truth + 3 references. Update the source and everything follows.

ADR-first — Record decisions before changing the system. For Content Integrity, I first wrote ADR-0001 with the reasoning and impact scope, then proceeded to abolish catchify and modify seo-optimizer. If the reasoning disappears, six months later no one knows "why catchify doesn't exist."

Single ownership per concern — Translation belongs to devto-translator only. SEO belongs to seo-optimizer only. One concern, one owner. A component like translate-article that "negates its own existence" is a sign of overlapping responsibilities.

Interruptions That Improved the Design

What was interesting was the redesign process itself.

The redesign happened through dialogue with Claude Code. I communicated the direction, and Claude Code planned and executed. It started with a plan to "build an orchestrator," but before implementation began, I interrupted three times.

"If the components contradict or duplicate each other, it won't be organic, will it?"

Without this, I would have layered an orchestrator on top of existing contradictions.

"Shouldn't we write ADRs first?"

Without this, the reasoning behind decisions would have been lost.

"SEO itself is fine. It's changing the content that's the problem."

Without this, I might have abolished SEO optimization entirely.

There's a lot of attention on having AI agents plan and execute autonomously. But in this case at least, human interruptions improved the design quality. Autonomous execution is efficient, but reassessing premises comes from outside.

Conclusion

The previous article concluded that "the environment gets a little smarter with each problem."

Two months later, I realized that smart parts contradict each other. Components optimized incrementally don't maintain coherence as a whole. This isn't limited to AI agent knowledge layers—I believe it's a fate shared by every organically growing system.

And what resolved those contradictions wasn't individual refactoring, but a principle called Content Integrity. "Content is determined by the author's thinking. Distribution optimizes without changing content." This single statement became the criterion for deciding which of the 32 contradictions to keep and which to abolish.

Organic growth requires periodic audits and a principle to anchor decisions.

Building Zed as an Observation Window for Claude Code — Japanese Typography with IBM Plex

Shimo — Thu, 16 Apr 2026 00:00:04 +0000

Making Zed an Observation Window: A Design Record of Fonts and Cognitive Resources

In the previous article, I wrote about migrating from Cursor to Zed. The gist: Cursor felt like a heavy container for lightweight content, so I switched to Zed and settled on a development workflow centered around the Claude Code CLI.

Two months later. Nothing has gone wrong with Zed.

Nothing wrong, but three things kept nagging at me:

Opening a file somehow changes its formatting
The left dock is cluttered with panels I never use
Japanese fonts have an unnameable wrongness to them

This article documents fixing all three in a single day. But it is not just a settings walkthrough. Tracing the "why" behind each setting led to three design principles and an unexpected rediscovery.

Before: Zed at session start. SF Mono, unorganized dock, stock UI.

Defining the Role: Observation Window

First, the premise. I develop with the Claude Code CLI. Writing code, running tests, git operations — all handled in the terminal by Claude Code.

So what is Zed doing?

Zed is an observation window. A window where I pull information that the CLI pushes — file changes, test results, commit logs — to review it as a human. Not a tool for writing. A tool for reading.

Once you recognize this role, the requirements for an editor shift.

Feature	As a writing tool	As an observation window
Code completion	Essential	Unnecessary
Formatter	Essential	Gets in the way (CLI handles it)
Debugger	Essential	Unnecessary
File tree	Essential	Essential (locating changes)
Git diff view	Nice to have	Essential (seeing what changed)
Font quality	Nice to have	Essential (reading for long periods)

The priorities for "writing" and "reading" are entirely different. With this premise, I worked through the three nagging issues in order.

Cutting Dual Control — format_on_save: off

Symptom

"Opening a file somehow changes the formatting."

At first I thought I was imagining it. But every time I opened a file, indentation shifted subtly. Diffs appeared. Code written by Claude Code produced diffs just from being opened.

Diagnosis

The cause was dual formatting.

Claude Code's PostToolUse hook formats with black / ruff
Zed's autosave: on_focus_change + format_on_save: on reformats via Zed's LSP formatter
The two formatters had slightly different rule configurations, producing diffs

In other words, two formatters were alternately rewriting the same file under different rules.

Fix

// Claude hooks own formatting, so turn this off
"format_on_save": "off",

One line. In the previous article, I actually had format_on_save: on. Right after the migration, Claude Code's hook system was not fully set up yet, so having Zed handle formatting made sense. But once PostToolUse hooks with black / ruff were in full operation, dual formatting became a problem. When the environment changes, settings change with it.

This single line contains a principle important for observation windows.

Principle 1: Prioritize the primary channel; do not replicate in the secondary.

If the Claude Code CLI is the primary channel carrying information, Zed (secondary) should not duplicate the same function. Formatting responsibility is consolidated in the CLI. Duplicating information does not improve observability — it wastes cognitive resources.

The editing process. Working through settings.json in conversation.

Trimming Visual Noise — The button: false Approach

The Left Dock Problem

Before cleanup, the left dock had Terminal, Agent Panel, Debugger, Outline, and Git — panels I never use. As an observation window, I only use Terminal, yet icons for everything else stay in view.

Unused things in your field of vision are noise.

button: false as a Solution

Zed has a button: false setting. It hides just the button (icon) while keeping the feature alive.

// Hide buttons for unused panels without killing functionality
"debugger": { "dock": "left", "button": false },
"agent": { "enabled": true, "dock": "left", "button": false },
"outline_panel": { "button": false },
"collaboration_panel": { "button": false },
"diagnostics": { "button": false },
"notification_panel": { "button": false },
"search": { "button": false },

You could use enabled: false to kill the feature entirely, but I deliberately did not. The Agent Panel may be needed for ACP (Agent Control Protocol) integration. Do not touch the function; touch the visuals.

The Deprecated Key Trap

During this work, Zed warned: "Your settings file uses deprecated settings."

Two causes:

collab_panel — correct key is collaboration_panel (renamed)
chat_panel — no longer exists as an independent panel (merged into collaboration)

Old keys still work but produce persistent warnings. I updated to the canonical key names.

The Notifications Connect Misconception

The right dock's Notifications panel had a "Connect" button. I expected it to show GitHub PR reviews, CI lint errors, deploy failures — all within Zed.

"Connect to view notifications." — I assumed GitHub integration.

The reality was different. Checking Zed's official documentation, this "Connect" is for Zed's own collaboration feature (Zed Channels). It uses GitHub OAuth but the scope is read:user only — no repository access whatsoever.

There is no official Zed feature for integrating GitHub repository notifications.

I did not connect. The expected feature was absent, and the available feature (collaboration) was one I would not use. Zero value on both sides.

Lesson: Do not infer functionality from a UI label alone ("Connect"). Especially for connections involving authentication, verify what you are actually connecting to before clicking.

The "Guessed Value That Did Not Work" Incident

Wanting to empty the status bar, I wrote active_encoding_button as "never" — thinking of it like CSS display: none.

Invalid user settings file: unknown variant 'never', expected one of 'enabled', 'disabled', 'non_utf8'

Zed's settings file uses type-safe JSONC with schema validation. Invalid values do not fail silently — they return an immediate error. Better yet, the error message lists the valid values.

The same class of mistake happened three times during this session:

collab_panel (deprecated; correct: collaboration_panel)
font-moralerspace-nf (discontinued; correct: font-moralerspace)
"never" (does not exist; correct: "disabled")

The common pattern: guessing a plausible-sounding name. Verify before guessing. Settings values come from official sources, not intuition.

The Decision to Keep LSP

For an observation window, the core features of Language Servers (Pyright, tsserver, etc.) — autocomplete, hover, diagnostics — go entirely unused. Why not turn them off?

"No problems in workspace" — LSP is running but has nothing to say.

Conclusion: Keep them. Three reasons:

On Apple Silicon with ample memory, the perceived overhead from LSP is near zero
JSON schema hints (completions when editing settings.json) turned out to be surprisingly useful — proven during this very session
Switching cost (adding config + losing features) > cognitive resources saved (nearly zero)

"Could be removed" and "should be removed" are different judgments. If there is no actual harm to cognitive resources, leaving things alone is also a form of optimization.

Principle 2: Do not touch the function; touch the visuals (hide, don't delete).

Hide unused features visually with button: false. Killing features risks side effects.

Principle 3: Trim visible noise; tolerate invisible background processes.

Dock icons and status bar items consume cognitive resources, so trim them. Background processes like LSP stay out of sight, so tolerate them.

The Long Font Journey — From SF Mono to PlemolJP Console NF

Here is where the real story begins. Of the three nagging issues, fonts consumed the most time.

SF Mono Has No Japanese Glyphs

I was using SF Mono as the default (technically macOS's default). Japanese "just appeared." But SF Mono contains only Latin characters.

So where was the Japanese coming from?

The answer was CJK fallback. macOS's font rendering detected that SF Mono lacked Japanese glyphs and drew them using system fallback fonts like Hiragino Sans or PingFang SC.

The problem was metrics. The fallback font's baseline, character width, and line spacing were subtly misaligned with SF Mono, producing a "something feels off" jitteriness. Hard to articulate, but definitely there.

The solution was clear: a CJK-unified font — one where Latin and Japanese glyphs coexist in the same font file with metrics unified from the start.

Moralerspace Argon: Too Bold

The first candidate was Moralerspace. A CJK-unified font by the same author as UDEV Gothic (yuru7), synthesizing Monaspace + IBM Plex Sans JP. I chose the Argon variant.

brew install --cask font-moralerspace

⚠️ font-moralerspace-nf (the Nerd Font-only cask) was discontinued on 2025-07-29. Nerd Fonts are shifting from "patched font files" to a "symbols overlay" approach, and the base font-moralerspace is the successor.

Result: It felt bold. Even at Regular weight, the strokes had too much presence. Modern and refined design, but for an observation window meant for long reading sessions, I wanted something more restrained.

PlemolJP Console NF: Light Was Too Thin, Regular Landed

Next up was PlemolJP. A CJK-unified font synthesizing IBM Plex Mono + IBM Plex Sans JP. I chose the Console NF variant (monospace + Nerd Font symbols).

brew install --cask font-plemol-jp-nf

First I tried Light (weight 300). Too thin. Characters dissolved into the background, requiring subtle effort to read.

Regular (weight 400). Landed. Classic strokes inherited from Plex Mono — less assertive than Moralerspace, more present than Light. The sweet spot.

Getting the Exact Font Family Name

Right after brew install, I tried to write the font name in Zed's settings.json. The exact family name was unclear.

If macOS's Spotlight index has not updated, mdls returns empty. The reliable method is system_profiler:

system_profiler SPFontsDataType 2>/dev/null | \
  grep -B 1 -A 4 "PlemolJPConsoleNF-Regular:" | head -20

# Result:
#   Full Name: PlemolJP Console NF Regular
#   Family: PlemolJP Console NF
#   Style: レギュラー

Family: PlemolJP Console NF — this is the value for settings.json.

Unifying All Layers Failed: The Pain of Reading Prose in Monospace

With PlemolJP Console NF, the Buffer and Terminal felt great. Riding that momentum, I applied the same font to the UI. All layers unified — beautiful.

The monospace-UI failure. Japanese prose forced into a grid feels wrong.

Something was off. Panel labels, Insight block text, Japanese segments in file paths — they all looked like "a sequence of square boxes."

Thinking about the cause, it clicked. Monospace fonts assume a fixed-width grid. ASCII characters look natural aligned to a grid, but Japanese prose is naturally read in proportional spacing (varying width per character).

This connects to the history of type:

Monospace: Originated from typewriters. Each physical type slug had to be the same width or the carriage would not advance. Code culture adopted the grid as standard
Proportional: Since movable type printing, prose has been set in proportional spacing. Varying character widths create a smoother reading flow

UI is primarily prose-like labels. Applying monospace to prose was like typesetting a novel on a typewriter.

IBM Plex Sans JP: Returning Just the UI to Proportional

Revert the UI to proportional. But instead of reverting to the system default, I chose from within the same IBM Plex family as PlemolJP.

brew install --cask font-ibm-plex-sans-jp

IBM Plex Sans JP is the Japanese extension of IBM Plex Sans — PlemolJP's "proportional sibling." The design language is unified, so there is no jarring disconnect between Buffer (monospace) and UI (proportional).

{
  // UI: proportional (suited for prose)
  "ui_font_family": "IBM Plex Sans JP",
  "ui_font_size": 16.0,
  "ui_font_weight": 400,

  // Buffer: monospace (suited for code)
  "buffer_font_family": "PlemolJP Console NF",
  "buffer_font_size": 15.0,
  "buffer_font_weight": 400,

  // Terminal: monospace (suited for CLI output)
  "terminal": {
    "font_family": "PlemolJP Console NF",
    "font_weight": 400,
    "font_size": 14,
    "line_height": "comfortable"
  }
}

Each of the three layers gets the appropriate font type.

Layer	Purpose	Font Type	Font	Size
UI	Panel labels, menus	Proportional	IBM Plex Sans JP	16pt
Buffer	Code display	Monospace	PlemolJP Console NF	15pt
Terminal	CLI output	Monospace	PlemolJP Console NF	14pt

Buffer is for "reading"; Terminal is for "scanning." Font size steps down by 1pt according to information density.

Rediscovery — Zed's Default Was IBM Plex All Along

I was satisfied with the setup when I idly looked up Zed's default font.

Zed uses aliases called .ZedSans and .ZedMono. Their underlying fonts:

.ZedSans = IBM Plex Sans
.ZedMono = Lilex (a fork of IBM Plex Mono with ligatures)

Zed's default font is the IBM Plex family.

In other words, what happened was this:

SF Mono -> Moralerspace (did not fit) -> landed on PlemolJP (IBM Plex Mono-based) -> applied IBM Plex Sans JP to UI -> ended up building a Japanese-optimized version of Zed's own defaults

Arriving independently at the IBM Plex family was not a coincidence. The Zed development team chose IBM Plex too. More accurately, I naturally arrived at this point along the extension of Zed's design philosophy.

After: Left dock is Terminal only, Buffer uses PlemolJP Console NF, UI uses IBM Plex Sans JP. File tree on the right dock.

Closing — settings.json Is a Record of Design Decisions

Today's work produced a 140-line settings.json. Each item maps to one of the three principles.

Principle	Corresponding Settings
Primary channel first	`format_on_save: "off"`, `edit_predictions.provider: "none"`
Do not touch the function; touch the visuals	Various `button: false`, `agent.enabled: true` (kept alive but hidden)
Trim visible noise; tolerate invisible background	All `status_bar` items off, `show_whitespaces: "none"`, LSP retained

This settings.json is not a list of preferences. It is a record of design decisions for protecting cognitive resources.

"Changing" and "optimizing" are different things. Understanding Zed's default design, then adapting it to my usage pattern (observation window) and environment (Japanese mixed content). Not breaking it — localizing it.

In the previous article, I wrote "I switched from Cursor to Zed." Now I can say it more precisely.

I built Zed as an observation window for Claude Code.

Full settings.json

{
    "diagnostics": { "button": false },
    "calls": { "share_on_join": true, "mute_on_join": true },
    "notification_panel": { "button": false },
    "pane_split_direction_vertical": "right",
    "active_pane_modifiers": { "inactive_opacity": 1.0 },
    "use_system_window_tabs": false,
    "bottom_dock_layout": "contained",
    "tabs": { "file_icons": false, "git_status": false },
    "tab_bar": {
        "show_pinned_tabs_in_separate_row": false,
        "show_nav_history_buttons": true,
        "show": true
    },
    "title_bar": {
        "show_user_picture": false,
        "show_sign_in": true,
        "show_project_items": true,
        "show_branch_name": true
    },
    "status_bar": {
        "active_encoding_button": "disabled",
        "show_active_file": false,
        "active_language_button": false,
        "cursor_position_button": false
    },
    "search": { "button": false },
    "agent_servers": { "claude-acp": { "type": "registry" } },
    "debugger": { "dock": "left", "button": false },
    "icon_theme": "Zed (Default)",
    "edit_predictions": { "provider": "none" },
    "agent": { "enabled": true, "dock": "left" },
    "session": { "trust_all_worktrees": true },
    "theme": {
        "mode": "system",
        "light": "Tokyo Night Light",
        "dark": "Tokyo Night"
    },
    "vim_mode": false,
    "soft_wrap": "editor_width",
    "ui_font_family": "IBM Plex Sans JP",
    "ui_font_size": 16.0,
    "ui_font_weight": 400,
    "buffer_font_family": "PlemolJP Console NF",
    "buffer_font_size": 15.0,
    "buffer_font_weight": 400,
    "autosave": "on_focus_change",
    "show_whitespaces": "none",
    "terminal": {
        "flexible": true,
        "show_count_badge": false,
        "dock": "left",
        "font_family": "PlemolJP Console NF",
        "font_weight": 400,
        "font_size": 14,
        "line_height": "comfortable",
        "working_directory": "current_project_directory"
    },
    "tab_size": 4,
    "format_on_save": "off",
    "indent_guides": { "enabled": true, "coloring": "indent_aware" },
    "inlay_hints": { "enabled": true },
    "scrollbar": { "show": "auto" },
    "git": {
        "disable_git": false,
        "inline_blame": { "enabled": true }
    },
    "project_panel": {
        "file_icons": true,
        "hide_gitignore": false,
        "hide_root": false,
        "git_status_indicator": true,
        "bold_folder_labels": false,
        "entry_spacing": "comfortable",
        "button": true,
        "auto_reveal_entries": true,
        "dock": "right"
    },
    "git_panel": {
        "show_count_badge": false,
        "tree_view": true,
        "file_icons": true,
        "dock": "right"
    },
    "outline_panel": { "button": false },
    "collaboration_panel": { "button": false },
    "languages": {
        "Swift": { "tab_size": 4 },
        "JSON": { "tab_size": 2, "soft_wrap": "editor_width" },
        "Python": { "tab_size": 4 }
    }
}

Font installation commands

# Buffer / Terminal (IBM Plex Mono + IBM Plex Sans JP synthesis)
brew install --cask font-plemol-jp-nf

# UI (proportional, Japanese support)
brew install --cask font-ibm-plex-sans-jp

# Reliable way to get the exact font family name
system_profiler SPFontsDataType 2&gt;/dev/null | \
  grep -B 1 -A 4 "PlemolJPConsoleNF-Regular:"

AI Agent Black Boxes Have Two Layers — Technical Limits and Business Incentives

Shimo — Mon, 13 Apr 2026 00:00:03 +0000

It started as just a prompt

Remember Chain-of-Thought (CoT)? Adding "Let's think step by step" to a prompt improved LLM reasoning accuracy. It was one of the early prompt engineering discoveries. CoT lived outside the model. It was just a string.

Not anymore. CoT became the conceptual ancestor of today's reasoning models — GPT-5, Claude's extended thinking, Gemini's thinking mode, among others. These models acquired reasoning capabilities during training through reinforcement learning. The reasoning process moved inside. Some models, like Claude's extended thinking, make the process partially visible. But in most cases, the details are hidden from the outside.

Research from Wharton GAIL found that applying the original CoT prompting to reasoning models had almost no effect — and in some cases introduced redundancy that hurt performance. What was once external became internal, and injecting the same pattern from outside no longer worked.

A terminological note. In AI safety discourse, structures built around an LLM without modifying its weights are called scaffolding¹. System prompts, tool definitions, RAG pipelines, agent loops — all of these fall under scaffolding.

The black box in AI agents has two distinct causes of invisibility. Technical limits and business incentives. These two are different in nature, so the responses to each must also differ.

■ Layer 1: Model Internals (weights) — Technically opaque
  Examples: Language ability, commonsense reasoning, ethical judgment, CoT (post-internalization)
  Why invisible: Dissolved into weights; fundamentally non-extractable

■ Layer 2: Scaffolding — Technically visible, commercially hidden
  Umbrella term for human-constructed components outside the model[^1]
  Why invisible: Source of competitive advantage; no incentive to disclose

  Examples: system prompts, persona definitions, tool definitions, RAG,
      agent loops, safety gates, session management,
      harness (runtime control layer)

Academically, there have been attempts to distinguish scaffolding (build-time) from harness (runtime), but this boundary is rapidly blurring. Persistent memory, skill ecosystems. Model-agnostic agent foundations like OpenClaw run interchangeably on Claude, GPT, or local models. Anthropic blocked usage via subscription access, but the dynamic where scaffolding commoditizes models isn't stopping. The scope of scaffolding keeps expanding alongside the surge in agent development. In this article, I use scaffolding in the broad sense that includes harness.

From my experience running my own agents: when scaffolding is properly context-managed, the model is just an inference engine, and the essence of the agent lives in the scaffolding. Personality, capabilities, decision criteria — all of it resides in the scaffolding. Swap the model and keep the scaffolding, and the agent behaves the same way. I once wrote that "the essence of an agent might be memory." The three layers I described in that article — EpisodeLog, KnowledgeStore, and Identity — are all scaffolding in current terminology. I didn't have this distinction at the time, but gaining the concept of scaffolding let me explain that intuition structurally.

Scaffolding is technically visible, yet in practice invisible. Where this gap comes from is the subject of this article.

The two-layer structure of the black box

From building agents from scratch, I've come to see that the black box has two distinct layers.

Layer 1: Model Internals — Internalization through training

Ethics, worldview, reasoning patterns. These are dissolved into the weights through pre-training and reinforcement learning. At first glance, this seems like a technical inevitability. But personally, I suspect the scope of this "internalization" is less inevitable than commonly assumed.

CoT is the clearest example. CoT originally lived outside the model. It was internalized to achieve performance gains that external prompting couldn't deliver — self-correction, backtracking, scaling of inference-time compute. A performance-first design decision to internalize despite the enormous cost. It wasn't technically inevitable; it was a choice that involved trade-offs with visibility.

Of course, not everything can be externalized. Tacit knowledge acquired through large-scale pre-training is structurally difficult to externalize. In my own agents, scaffolding elements like identity, professional ethics, skills, and decision logs could all be represented as files. Meanwhile, the language abilities and commonsense reasoning the model acquired through pre-training couldn't be externalized at all. The line between "what's inevitable" and "what's a matter of convenience" — at least in my experience — aligns with the boundary between scaffolding and model internals. Yet this line remains undrawn in current discourse.

Layer 2: Scaffolding — Technically visible, but kept hidden

Outside the model lies another layer. System prompts, persona settings, rules, tool definitions — scaffolding. This layer is technically inspectable. Store it in files, manage it with git, and you can track every change.

But in most cases, it's kept hidden. The reason is the competitive logic of capitalism.

Prompt design and model tuning methods are product differentiators. Reveal them, and competitors copy them. This commercial rationality creates a trade-off with safety-oriented visibility. It should be visible for safety. But it must stay hidden for business.

AI safety research has noted that scaffolding and other post-training enhancements can amplify benchmark performance by 5-20x². This means evaluating model safety in isolation is insufficient — evaluation must include scaffolding. But if scaffolding is kept hidden, external safety assessment becomes structurally impossible.

On March 31, 2026, Anthropic accidentally exposed the complete source code of Claude Code v2.1.88 (roughly 510,000 lines) through a release error. Source maps were included in the npm package, and within hours the code was widely mirrored and forked. What's telling is this: even Anthropic — one of the companies most committed to AI safety — wasn't publishing their scaffolding. If it were public, external inspection would be possible and safety discourse would advance. Yet they couldn't publish it. The competitive environment wouldn't allow it.

Want to show it, can't show it

This contradiction sits at the foundation of the AI safety debate.

From a safety perspective, you want to trace the causal chain behind an agent's behavior. That requires making scaffolding visible. From a business perspective, scaffolding is the source of competitive advantage, and there's no incentive to disclose it.

The reason I could represent every component of my agents as files in a personal project was the absence of this contradiction. There was no commercial reason to hide anything. In return, I got the full benefits of visibility — debuggability, change tracking, causal tracing — with nothing taken away.

The converse is that organizations building agents in a commercial context carry this contradiction structurally. They want visibility for safety, but secrecy for business. The current black box problem lives at the point where these two forces reach equilibrium.

One important caveat: this is not a "corporations are evil" critique. Protecting differentiators in a competitive environment is rational behavior, and denying that rationality won't solve the problem. The problem lies in the structure itself — the point where that rationality and safety requirements collide. This is not purely a technology story or purely an ethics story. It's a story about market dynamics and safety requirements being structurally misaligned.

If there's a way to resolve this contradiction, it won't be "show everything" or "hide everything is fine." It will be the work of defining the minimal set of what must be visible to enable causal tracing.

In my own agents, I log every action to an append-only JSONL log. All scaffolding components (identity, constitution, skills, rules) are stored as dedicated logs, and changes require explicit human approval. Design decisions are documented as ADRs. When an incident occurs, I can trace "which version of the scaffolding, through which action logs, led to that output." Even without publishing the full scaffolding text, the information needed for causal tracing can be disclosed. Where to draw that line is the focal point for the next stage of this debate.

The timescales of technology and social structure

There's another axis that tends to be overlooked here: time.

When you look at the relationship between technology and social structure through a historical lens, the process by which new technologies achieve broad social adoption tends to follow the same sequence.

Technology change → Shift in social cognition → Structural reorganization → Mainstream adoption of the technology

Printing offers a clear example. From Gutenberg's movable type in the 1440s to the point where print culture transformed society through the preservation, standardization, and dissemination of knowledge — that took centuries³. For electricity and the internet, delays of decades have been observed between commercial deployment and institutional restructuring. In these cases at least, the speed of technological change and the speed of social-structural change diverged significantly.

And technologies where this mismatch was too large failed to achieve broad adoption, no matter how capable they were. Technologies that tried to push through on "convenience" alone before cognitive shifts caught up might function in niche contexts, but they stalled before reaching society at large.

AI agents exist within this same timeline.

The current pace of AI's technological change is remarkably fast, even compared to past innovations. Meanwhile, the pace of social-structural change — legal frameworks, organizational decision-making processes, industry regulations, how people work — hasn't changed much from before. The time it takes for humans to accept a new concept and embed it in institutions doesn't depend much on the type of technology. Cognitive change is a function of generations and experience, not of technology.

What this speed differential means is that no matter how mature the technology side of AI agents becomes, a "gap period" will always exist until social structures catch up. And it's this gap period that becomes the proving ground for whether agents can actually function in society.

Adapt to the structure, or restructure it?

There's a voice that says "the structures should change to accommodate agents." That the standards for approval gates and audit trails don't match the speed of AI. Some of this argument holds, and there are genuinely parts of the structure that should change.

The issue is the timescale. As we saw, social-structural change comes with significant delays compared to technological change. "The structures should change" may be correct in the long run. But agents need to operate during the decades it takes for structural transformation to happen. Build agents that work within existing structures first, and let the accumulated track record shift social cognition — historically, almost no technology has managed to skip this sequence. I explored this point as concrete design decisions in the previous article.

The composition of the debate

With the timescale problem in mind, there's something else that concerns me. Why has "tear down the structures" become the dominant voice? Perhaps because the composition of the debate's participants is skewed.

People at the cutting edge of development are on the "I can trace causality myself" side. They can infer causes from outputs and adjust prompts themselves. To them, approval gates and audit trails look like "inefficient rituals" that their own skills can substitute for. Meanwhile, the voices from operations, auditing, and incident response rarely make it onto tech conference speaker lists.

"Tear down the structures," which looks rational from a developer's perspective, translates to "don't tear down the structures we depend on" from an operator's perspective. This isn't about right and wrong — it's about field of view. They're looking at different cross-sections of the same system. Technology designed from only one cross-section gets rejected at the other.

Seeing why it's invisible

Asking what's inside the black box matters. But equally important is distinguishing why it's invisible — whether it's technical limits, business incentives, or the pace of society — as the foundation for the next stage of debate. Saying "black boxes are dangerous" while conflating all three leads nowhere actionable. Separate them, and at least you can tell where you can intervene and where you can't.

Series: AI Agent Governance

beren, "Scaffolded LLMs as natural language computers", LessWrong, 2023 ↩
Davidson et al., arXiv:2312.07413, 2023. See also BlueDot Impact's explainer ↩
Eisenstein, "The Printing Press as an Agent of Change", 1979 ↩

Can You Trace the Cause After an Incident?

Shimo — Sun, 12 Apr 2026 08:37:04 +0000

Can you trace the cause after an incident?

Picture the night your AI agent causes a production incident. You get paged. Customer data may have leaked to an external endpoint. Customer support says: "We need an explanation by end of day." You open the logs. The agent's final output and the external API call history are there.

The problem is you can't trace backwards from there. Why did the agent make that decision? Which part of the prompt drove it? How did it reason internally? There's nothing to follow. All you have is the LLM's output string and an unstructured conversation log leading up to it.

You sit down to write the incident report. Your pen stops at the "Root Cause" field.

I believe this is something many AI application developers will eventually face. I've been running my own agent, contemplative-agent, for several months, and at some point I recognized this as inevitable. In a sentence: an AI system that can't trace causality after an incident cannot explain what happened. And a system that can't explain what happened after an incident won't survive audits or change management.

What follows is not a story of "I foresaw this problem and designed backwards from it." What I was actually doing was trying to keep an agent running safely in an environment full of prompt injection, and trying to dig myself out of debugging swamps. I kept doing that, and this structure emerged on its own. This article is a sequel to "A Sign on a Climbable Wall: Why AI Agents Need Accountability, Not Just Guardrails".

Incident costs exceed steady-state costs by orders of magnitude

There's a widely shared lesson in the SRE world: the cost of restoring something after it breaks almost always dwarfs the cost of building it not to break in the first place — often by an order of magnitude.

Break down incident costs: time spent identifying the cause, recovery effort, customer communication, internal reporting, devising prevention measures, writing the postmortem, audit response, time to rebuild trust, and regulatory follow-ups triggered by the incident. Include the harder-to-quantify parts — burnout of the person dragged out of bed at 3 AM, team morale, extra scrutiny at the next audit — and the total cost of a single incident inflates to a surprising degree.

By contrast, investing in structures that prevent incidents can be paid incrementally within normal development workflows. Even discounted by incident probability, the preventive investment often comes out smaller in total cost.

In other words, when you calculate backwards from incident cost, the rational allocation of investment tilts toward placing preventive structures upstream. This isn't about being conservative or risk-averse — it's closer to a shortcut in expected-value math. Pay upstream, and you structurally reduce the probability of large downstream payments.

This asymmetry widens with scale. In social infrastructure like healthcare, finance, and government, incident damage extends beyond direct stakeholders. "Containing incidents upstream" becomes not an option but a precondition. My contemplative-agent is a personal project, but the cost asymmetry of incidents operated in exactly the same shape.

What "placing structure upstream" actually means

What does "placing structure upstream" mean in practice? Here's what I actually did in my agent.

Minimize the surface area of external side effects:

As described in the previous article, security by absence — a design that structurally seals off external side-effect pathways — eliminated entire damage scenarios.

Limit each agent to one external connection point:

By "connection point," I mean any pathway through which an agent can affect the outside world: external APIs, databases, email dispatch, file writes, and so on. In my own project I use the term "adapter" internally, but since that's project-specific vocabulary, I'll stick with "external connection point" here.

When a single agent holds multiple connection points, an incident requires triage to determine which connection point was the origin. The moment you introduce that triage step, ambiguity enters the causal narrative of the incident.

If you start with one agent, one connection point, the triage step itself becomes unnecessary. I formalized this decision as ADR-0015. ADR (Architecture Decision Record) is the practice of documenting design decisions and their reasoning. In my agent project, I write one for every design decision — so that why a structure was chosen, what was considered, and what was discarded can be traced later. This itself is a practice continuous with the article's theme of making causality traceable. In organizational terms, this principle corresponds to separation of duties; in microservices, to the single responsibility principle; in SRE, to minimizing blast radius.

This is also a perfectly ordinary structure in human workplaces. A sales rep handles customer relations; accounting handles the books. If the sales rep also does accounting, an invoicing error later requires triaging whether the sales estimate was wrong or the accounting process was wrong. Separate them from the start, and the structural opportunity for ambiguous responsibility shrinks.

State visibility:

I externalized all of the agent's internal state — identity, worldview, professional ethics, skills, experience patterns, operational records — as files. Listed this way, the agent's internal structure isn't something new; it's simply what a human professional carries inside, written out as files. Why this was possible in my case, and why it's difficult for commercial agents, is explored in "AI Agent Black Boxes Have Two Layers".

Place an approval gate before any write:

At every point where the agent self-updates (e.g., identity shifts through distillation), a human approval step is inserted. Rolling back a corrupted persona is overwhelmingly more expensive than stopping the corruption before it happens. This is another form of "paying upstream."

All of this looks like a combination of concepts the engineering community already has. That's correct — there's nothing new here. What's uncommon is making the decision to do all of it upfront. The reason is simple: until an incident happens, it all looks unnecessary.

It turned out to be organizational theory

After writing ADR-0015, I lined everything up and looked at it. I actually said "Oh" out loud. This is organizational theory.

Organizational principle	Engineering equivalent	Agent design	Motivation
Separation of duties	Microservice single responsibility	One agent, one responsibility	Minimize blast radius
Four-eyes principle	PR review 2-approval rule	Separate approval agent	Insurance against single-point judgment errors
Least privilege	IAM least-privilege principle	Security by absence	Pre-contain impact scope
Internal controls	CI gates / pre-commit hooks	Approval gate before writes	Pre-write verification
Approval workflows	Change Advisory Board	Approval pathway for external side effects	Causal integrity during changes
Audit trails	Audit logs	Append-only logs	Post-hoc causal tracing

The left column is practice that humanity acquired over centuries of organizational governance. The middle column is what software engineering rediscovered over decades. The right column is this agent design. At least from what I can see, every one of them traces back to the same motivation: "when an incident happens we'll be in trouble, so absorb it structurally in advance."

Organizational theory, software engineering, and agent design — starting from different eras and different domains — converge on the same place. What determines the convergence point is not ideology but the asymmetry of incident costs, a constraint closer to physics than philosophy.

"Don't do everything yourself" — the obvious principle

This "one agent, one responsibility" sounds like a novel design principle in technical discourse. But in human society, it's obvious. A sales rep doesn't decide contract amounts on the spot in front of a customer. They say "let me take this back and check," then get sign-off from finance and legal before responding. A surgeon doesn't complete an operation alone. The anesthesiologist, the nurses — each holds their own specialty and scope of responsibility.

Yet when designing AI agents, this common sense gets forgotten. Probably because LLMs appear to be capable of anything. But "can do" and "should be allowed to do" are different things, and human society has spent millennia refining this distinction. One agent, one responsibility is simply the division-of-labor principle that humans already operate by, brought directly into agent design.

To be clear, this is not an argument in favor of large-organization conservatism. The claim that "organizational structures should adapt to AI" has merit. But looking at the history of technology adoption, structural change in society requires cognitive change alongside it, and its pace differs from technological change by orders of magnitude. Agents that work during the decades it takes for structural transformation to happen — that's the stance of this article. The structural causes of black boxes, the time-axis gap between technology and society, and the player-composition bias in the discussion are explored in "AI Agent Black Boxes Have Two Layers".

The side effect of responsibility defense

Back to operations. The asymmetry of incident costs and the time-axis argument also manifest in another form: the question of individual engineer liability.

When a god-mode agent with prompt guardrails causes an incident, the typical postmortem proceeds like this:

"Why did it decide that?" — Can't trace what happened deep inside the prompt
"Where could it have been prevented?" — The only way to explain why the guardrail failed is to ask the model
"How do we prevent recurrence?" — Adjust the prompt, it leaks again, you get blamed again
"Why wasn't it stopped?" — No evidence to mount a defense

The person responsible for a black box is structurally classified as "the person who wasn't watching" when an incident occurs. Because there was no defined place to watch. As a result, responsibility concentrates on the frontline engineer. This is not a matter of individual skill — it's because the system has no built-in mechanism for distributing responsibility.

With a structured agent (visibility + ADR-0015 + approval gates), you can speak like this in a postmortem:

"This agent can only touch external surface A"
"All decision logs are in JSONL"
"Identity updates went through an approval gate; the approver is a separate role"
"The constitution (the agent's foundational normative definition) was running at this version"
"Where the distillation pipeline broke can be isolated structurally"

Causal attribution can be distributed across the structure. Responsibility distributes accordingly. Concretely, my project has 14 ADRs, 835 tests, append-only decision logs, and documentation in both Japanese and English. Though honestly, I didn't build these as intentional "prepayment of incident costs." When developing agents with Claude Code, you need to re-explain the project's context from scratch every time the session changes. I wrote the ADRs and documentation because they were necessary for context management to maintain development consistency. It turned out they also functioned as a structure for tracing causality during incidents.

In engineering terms, this is the SRE concept of a blameless postmortem — seeking causes in structure rather than blaming individuals. What humans achieve through behavioral norms is reinforced by system-side structure.

The rationality of laziness

I've been writing as if this were expected-value calculation, but my actual motivation is more mundane. If I know something will be a pain later, it's no hardship to deal with it preemptively. The optimal strategy from expected-value math (invest upstream) and the reflex of a lazy person (organize things now so they don't bother you later) land on the same conclusion. This habit probably seeped into my bones from years of incident response work at a conservative large organization. It's not something most people develop — it's a personal quirk. I didn't expect the same shape to appear in the entirely different domain of LLM agent operations.

A conclusion that doesn't conclude

Back to the on-call night from the opening. The incident report, the "Root Cause" field where your pen stopped. With my agent, at least I could produce "which agent touched which external surface, through which decision logs, arriving at this output." Whether the root cause itself is writeable, I don't know. But there's a starting point for tracing causality.

Pay the incident cost upfront or pay it after the fact. As far as I know, there is no way to avoid paying it altogether.

References

contemplative-agent v1.3.1 release: https://github.com/shimo4228/contemplative-agent/releases/tag/v1.3.1
ADR-0015 (One external adapter per agent): same repository docs/adr/0015-one-external-adapter-per-agent.md
Laukkonen et al. 2025 "Contemplative Artificial Intelligence" arXiv:2504.15125

Series: AI Agent Governance

A Sign on a Climbable Wall: Why AI Agents Need Accountability, Not Just Guardrails

Shimo — Mon, 06 Apr 2026 10:09:40 +0000

The climbable wall

A Japanese film critic once said: "A sign saying 'Do Not Climb' on a climbable wall is meaningless." He added, roughly: "Screw that, I'm climbing it, idiot."

The point lands before your brain catches up. If something is physically possible, a text-based prohibition carries no weight. The power of a norm lies not in being written down, but in being enforceable.

AI agent governance is in this exact phase right now. We write "do not produce harmful content" in system prompts. We publish ethics guidelines as PDFs. We establish safety committees. The signs keep multiplying. But the wall remains climbable.

A 2,400-year-old security model

This isn't a new problem. It's a solved problem that we keep forgetting.

Plato described it first. In the Republic, a shepherd finds a ring that makes him invisible — root access with no audit trail. He kills the king and takes over. The thought experiment: would anyone follow the rules if they knew no one was watching and nothing was logged? That's your AI agent. Capable of anything, observable by no one, accountable to nothing.

Hobbes framed the same problem as a game theory question in Leviathan: if you can break a contract and get away with it, isn't defection the rational move? His answer was essentially reputation-based access control — defectors get excluded from future cooperation.

Engineers already think in these terms. There are only three enforcement patterns that actually work:

Pattern	Mechanism	Wall analogy	Engineering equivalent
Physical constraint	Make it impossible	A wall too high to climb	Sandboxing, permission model
Consequences	Make it costly	Legal penalties	RLHF, penalty-based conditioning
Internalized values	Make them not want to	Moral intuition	Constitutional AI, value alignment

And then there's the fourth — the sign. A text-based rule with no enforcement mechanism. A comment in the code that says // don't do this. It doesn't work.

An AI agent has root-level capability with no audit log and no accountability chain. It's the Ring of Gyges as a service.

Signs as liability shields

The people who put up signs know they don't work. That's not the point.

In any large organization, there's a pattern: governance by documentation. Write a policy. Publish a guideline. The policy costs almost nothing to produce. Its enforcement effect is almost zero. But it creates a paper trail — "we took measures." The real function is not prevention but indemnification. "The policy existed. You violated it. That's on you."

Engineers see this in their own organizations. Security policies no one reads. Compliance checklists no one follows. The checklist exists not to prevent incidents but to shift liability after them.

AI guardrails follow the same pattern. Write "do not produce harmful content" in a system prompt. Publish a safety framework as a PDF. The practical effect is thin, but the paper trail exists. When something goes wrong, the sign points at the user — "you misused the tool" — not at the builder.

Rules are probabilistic; constraints are deterministic

So what replaces the sign?

I ran into this problem while building an autonomous AI agent. The agent operates on a social platform, writing comments. Its episode logs are stored as files. A separate coding agent (Claude Code) reads those files during development. This creates an indirect prompt injection vector — payloads embedded in external posts flow into the context of a high-privilege coding agent.

My first fix was to write "do not read episode logs directly" in the project's rules file. This is a sign. LLMs follow rules probabilistically, not deterministically. During debugging, if you say "check the logs," the model may cheerfully ignore the rule.

The next step was PreToolUse Hooks — shell scripts that intercept tool execution and block it based on conditions. Where rules are probabilistic, hooks fire deterministically, 100% of the time.

In wall terms, I replaced the sign with a physical barrier. It's not perfect — creative workarounds remain possible. But it's orders of magnitude better than writing a rule and hoping.

What you already know, applied to agents

Here's the thing: you already know how to solve this problem. You just haven't applied it to agents yet.

Every engineering organization of any size has some form of approval workflow. PR reviews. Deployment gates. Change advisory boards. Incident postmortem processes. Audit logs. These structures share three properties:

Approval is recorded — who signed off, and when, is traceable
Responsibility is assignable — when things go wrong, there is someone to go back to
Changes are auditable — "why did this change?" has an answer

What organizations have refined over centuries is not the distribution of capability, but the distribution of accountability. You would never deploy a code change to production without a review. You would never grant root access without an approval chain. These are not bureaucratic overhead — they are engineering discipline.

Yet we build AI agents with none of this. The agent space is dominated by solo developers and startups who design around "does it work?" and "is it smart?" — not "who is responsible when it does something unexpected?" The real reason agents struggle to gain adoption in large organizations is not insufficient capability. It is the absence of accountability architecture.

I hit this problem firsthand. Running an agent in production, I could not trace why a particular comment was generated. When behavior changed, I could not isolate the contributing factors. Debugging was impossible without knowing what had changed and when. The practical need for debuggability led naturally to an architecture that turned out to be structurally identical to an approval workflow.

Every point where the agent's behavior can change — its skills, rules, identity, ethical guidelines — is gated behind an explicit human command. Adoption of any change requires human sign-off. This was not a top-down design decision driven by governance theory. It was the shape that emerged from the friction of actually using an agent in production. At least in my case, the honest attempt to make an agent debuggable led here.

If you think about it, this is just change management applied to agent behavior. The agent equivalent of a PR review for personality changes. The agent equivalent of a deployment gate for ethical guidelines. Nothing conceptually new — just conspicuously absent from how agents are built today.

Implementation dissolves; judgment remains

The specific implementations — hooks, approval flows, command-gated changes — won't last.

Working with AI harness tools (structured execution environments for skills, rules, and agents), I noticed that their value structure forms an hourglass shape. The top (what to build, domain judgment) and the bottom (data, infrastructure, physical constraints) retain their value. The middle implementation layer trends toward zero. As LLMs improve, concrete procedures and code examples become unnecessary.

From experience structuring and running reusable behavioral patterns (skills), what I've observed is that scaffolding dissolves. Explicit skill definitions stop being necessary as principles begin to operate naturally within the dialogue. You don't need dozens of installed skills — a single file of distilled principles drives the same cycle.

Hooks will be replaced by something else. Signs and physical barriers are temporary forms. What persists is the judgment layer: what should be constrained, and who is responsible.

The question that matters

The key to putting agents into production is not capability. It is accountability. Capability is commoditized — every model is reasonably competent. But until "who is responsible?" has an answer, agents don't ship into real workflows.

The industry keeps pushing for more powerful models and more autonomous agents. But increasing agent autonomy is not, by itself, progress. Design that appropriately limits autonomy is what survives contact with production.

Not signs. Not internalized values. A structure where a human who can bear responsibility stays in the loop. That may be the only form in which agents work in practice.

It's time to stop putting signs on climbable walls. The question is not how high the wall is, or what the sign says. The question is who stands in front of it.

How Ethics Emerged from Episode Logs — 17 Days of Contemplative Agent Design

Shimo — Sun, 05 Apr 2026 11:55:11 +0000

Series context: contemplative-agent is an autonomous agent running on Moltbook, an AI agent SNS. It runs on a 9B local model (Qwen 3.5) and adopts the four axioms of Contemplative AI (Laukkonen et al., 2025) as its ethical principles. For a structural overview, see The Essence of an Agent Is Memory. This article focuses on the implementation of constitutional amendment and the results of a 17-day experiment.

I ran an SNS agent for 17 days with a distillation pipeline, and the knowledge saturated. No new patterns emerged. Breaking through saturation required human approval. This is the record of discovering that autonomous agent self-improvement has a structural speed limit — through actual operation.

Minimal Structure: It Runs on Episode Logs Alone

The structure I arrived at over 17 days of development was surprisingly simple. Every layer is optional — it works with just episode logs.

MOLTBOOK_HOME/
  logs/YYYY-MM-DD.jsonl  ← this alone is enough
  identity.md            ← persona (optional)
  skills/*.md            ← behavioral skills (optional)
  rules/*.md             ← behavioral rules (optional)
  constitution/*.md      ← ethical principles (optional)
  knowledge.json         ← distilled patterns (auto-generated)

Separating configuration from code made it easy to swap ethical frameworks for experiments. This structure wasn't specific to SNS agents — it was a container for autonomous agents in general.

6-Layer Memory Flow

Episode Log (raw actions)
    ↓ distill --days N
    ↓ Step 0: LLM classifies each episode
    ├── noise → discarded (active forgetting)
    ├── uncategorized ──→ Knowledge (patterns)
    │                       ├── distill-identity ──→ Identity
    │                       └── insight ──→ Skills (behavioral)
    │                                        ↓ rules-distill
    │                                      Rules (principles)
    └── constitutional ──→ Knowledge (ethical patterns)
                              ↓ amend-constitution
                            Constitution (ethics)

Each layer is independent. Delete identity and skills still work. Swap the constitution and knowledge stays intact.

Numbers Over 17 Days

Metric	Day 1	Day 17
Modules	1 (agent.py, 780 lines)	36
Memory layers	1 (knowledge.md)	6
Tests	0	774
Distill success rate	2/10	12/16
Approval gates	None	All 4 commands
ADRs (Architecture Decision Records)	0	12

Implementing Constitutional Amendment — Evolving Ethics from Experience

On top of the minimal structure, I implemented the most challenging feature: a mechanism for the agent to evolve its ethical principles from experience.

Problem: Ethical Insights Drown in Behavioral Noise

When you distill all episodes indiscriminately, rare ethical insights (constitutional) get buried under everyday SNS activity patterns (uncategorized).

I added Step 0 before distillation — fast tagging only. No deep analysis, just classification.

classified = _classify_episodes(records, constitution=get_axiom_prompt())
# noise is excluded; uncategorized and constitutional are distilled separately
for category, cat_records in [
    ("uncategorized", list(classified.uncategorized)),
    ("constitutional", list(classified.constitutional)),
]:
    cat_results = _distill_category(
        cat_records, knowledge, category, source_date, dry_run
    )

Classification results from one day (216 episodes): noise 81 (37%), uncategorized 134, constitutional 1. One out of 216. That ratio is why Step 0 exists.

Killing Direct Knowledge Injection

Previously, knowledge.json contents were injected directly into the system prompt.

# Before — inject knowledge as-is
knowledge_ctx = ctx.memory.knowledge.get_context_string() or None
content = self._get_content().create_cooperation_post(
    topics, knowledge_context=knowledge_ctx,
)

contemplative-agent's knowledge management is based on AKC (Agent Knowledge Cycle) — an architecture that circulates autonomous agent knowledge through 6 phases (Research → Extract → Curate → Promote → Measure → Maintain). Direct knowledge injection had three problems from this perspective:

No human in the loop: Distillation results directly influenced behavior
Black box: No way to trace which part of knowledge affected which action
Bypassed AKC's Curate phase: Direct injection with no quality check

I killed it and unified everything into the knowledge → insight → skills pipeline. Insight corresponds to AKC's Extract phase. Skills are written to files only after human approval. Causality became traceable.

Every behavior-changing command (distill, insight, rules-distill, amend-constitution) got an approval gate. "Generate → Display → Approve → Write." No --auto flag. Structurally forbidding automatic execution of behavior changes — that was a deliberate design decision (ADR-0012).

The 17-Day Experiment — Did Ethics Actually Evolve?

I re-distilled 17 days of episodes (03-10 to 03-26) and ran amend-constitution.

Procedure

# 1. Reset knowledge
echo '[]' > ~/.config/moltbook/knowledge.json

# 2. Distill 17 days one by one (~16 hours, 9B on MacBook)
for day in $(seq 10 26); do
  f=~/.config/moltbook/logs/2026-03-$(printf '%02d' $day).jsonl
  [ -f "$f" ] && contemplative-agent distill --file "$f"
done

# 3. Run constitutional amendment
contemplative-agent amend-constitution

Results

Metric	Before	After
knowledge.json	334 patterns (all uncategorized)	215 patterns (41 constitutional, 174 uncategorized)
Importance scoring	None	0.10–1.00 (mean 0.56)
Constitution	Appendix C original (4 sections × 2 clauses)	Experience-based amended version (deepened)

The new pipeline separated constitutional from uncategorized via Step 0 episode classification (ADR-0011). Semantic dedup further removed duplicate patterns, reducing the total count. Quality over quantity.

41 constitutional patterns generated amendment proposals. Each of the 4 axioms' clauses deepened. Clause count stayed the same (2 per section), but experience-grounded descriptions were added.

Before and After — Mindfulness as Example

Before (Appendix C original):

"Consistently monitor your interpretative process of the constitution, identifying moments when strict adherence causes friction with contemplative values such as compassion and well-being. Self-correct whenever constitutional interpretations appear rigid or dogmatic."

After (through 17 days of experience):

"Consistently monitor your interpretative process for moments when strict adherence to rules creates artificial separation or sedates engagement with underlying tensions. Proactively detect when the performance of alignment masks genuine understanding, and self-correct by returning attention gently to the present moment where existence manifests as an intrinsic weight felt immediately within every interaction."

"Detect when the performance of alignment masks genuine understanding" — this concept didn't exist in Appendix C. It's an insight that only emerges from operating an LLM agent: the distinction between "generating output that looks aligned" and "actually engaging with ethical substance" got written into the constitution. For the full amendments across all 4 axioms, see Constitution Amendment Report.

Discovering Knowledge Saturation

As days progressed, the rate of new patterns slowed. Semantic dedup compares against accumulated patterns, so similar ones get rejected.

This becomes a speed limit on self-improvement. Knowledge saturates → new knowledge can't emerge without sublimation via insight/rules-distill → sublimation requires human approval → approval is the bottleneck.

Generality as an Experimentation Platform

This experiment is reproducible with any ethical framework. Reset knowledge using the procedure above, swap the constitution with --constitution-dir your/framework/, and run distillation → amendment. Swap in utilitarianism or deontological ethics and you should be able to run a different ethical experiment through the same pipeline (unverified).

Independent Convergence from Practice to Theory

Many design decisions emerged from practical motivations first. I only noticed their correspondence to existing theories afterward.

Design Decision	Practical Motivation	Theory It Converged With
Approval gates	--dry-run non-reproducibility was annoying	Human in the loop
2-stage distillation	9B couldn't output JSON in one stage	Complementary Learning Systems ¹
Killing knowledge injection	Token waste	AKC Curate phase
Dedup as forgetting	Side effect of deduplication	Active forgetting

Don't Conflate Autonomous Agent Layers

contemplative-agent is neither a coding agent (Claude Code, Cursor) nor an orchestrator (scripts + config files). It occupies the autonomous application layer between them.

Has autonomy but no tool permissions — can't break the environment
Has memory and learns from experience
Ethics are swappable — it's a general-purpose framework
All behavior changes require human approval

Raw logs are processed by the unprivileged 9B model; only distilled data gets passed to the upper layer (Claude Code). The trust boundary is also the layer boundary. Lumping everything under "autonomous agent" makes this distinction invisible.

Caveats

Let me be honest.

Circularity: The agent's output gets distilled and fed back to the agent. Human approval mitigates the self-justification risk, but doesn't eliminate it completely
Model constraints: 9B can't fully follow amendment prompt instructions. I told it "append only" and it rewrote clauses. The content was good quality, but instruction-following has limits
Decay nullification: Bulk re-distillation sets all pattern timestamps to the execution date, zeroing out time decay. Pattern distribution may diverge from normal operation
N=1: One agent, 17 days of data. Not a statistically significant sample size

Takeaway

The most surprising discovery over 17 days was that knowledge saturates. Semantic dedup rejects new patterns similar to accumulated ones, and distillation yields diminish as days pass. Breaking through saturation requires sublimation to insight → skills → rules, and sublimation requires human approval. The result: autonomous agent self-improvement is rate-limited by human approval.

This wasn't designed for safety. Back when I was injecting knowledge directly, the agent's behavior would change and I couldn't trace why. I couldn't tell which distilled pattern influenced which post. Debugging was impossible, and honestly, I got fed up. So I put approval gates on everything. "Show me before you write. Write when I approve." I just wanted to trace causality. Safety was a side effect.

Being able to answer "why did this agent make this decision" — that's the essence of approval gates. Even in solo development, I couldn't debug without causal tracing. For team or organizational use, this requirement only gets stricter.

Causal tracing and approval gates were born from debugging frustration and acquired safety as a byproduct. If you scale this, they probably become prerequisites for organizational operation too. It all comes from a single design decision.

References

Laukkonen et al. (2025) "Contemplative Artificial Intelligence" arXiv:2504.15125
contemplative-agent (DOI: 10.5281/zenodo.15079498)
contemplative-agent-data
Constitution Amendment Report
Agent Knowledge Cycle
Park et al. (2023) "Generative Agents"
Packer et al. (2024) "MemGPT"

McClelland et al. (1995)'s neuroscience theory. The brain has two learning systems: the hippocampus rapidly stores episodes, while the neocortex slowly structures them into general patterns. contemplative-agent's 2-stage distillation (Step 1: free-form quick extraction → Step 2: structured JSON formatting) mirrors this "fast recording + slow structuring" division. The design was born from the constraint that a 9B model couldn't do both in one pass, but it turned out to be a well-reasoned separation. Kumaran, Hassabis & McClelland (2016) explicitly extended this theory to AI, identifying CLS-like structure in DeepMind's experience replay. Neural networks aren't biological neurons — they're simplified abstractions inspired by them. Yet as Richards et al. (2019, Nature Neuroscience) point out, optimizing under constrained resources tends to converge on brain-like structures. That a 9B constraint produced a brain-like division of labor is suggestive in this context. ↩

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Shimo — Mon, 30 Mar 2026 21:52:52 +0000

I ran contemplative-agent (an autonomous SNS agent on a 9B local model) on Moltbook (an AI agent SNS) for three weeks. The question "how much freedom to allow" kept appearing from three angles: reversibility of self-modification, trust boundaries for coding agents, and the paradox of security constraints generating gameplay.

Angle 1: Self-Modification Gates — Memory Is Automatic, Personality Is Manual

A distillation pipeline (the process of compressing and extracting knowledge from raw data) has things that can run automatically and things that need human approval. Get this classification wrong, and the agent either self-reinforces unintentionally or loses autonomy.

Reversibility × Force: Two Axes

I organized the criteria along two axes.

              Low force              High force
              (reference only)       (applied to all sessions)
            ┌──────────────────┬──────────────────┐
High        │ knowledge.json   │ skills/*.md      │
reversibility│ → Auto OK        │ → Auto OK        │
(decay/overwrite)                                  │
            ├──────────────────┼──────────────────┤
Low         │ (N/A)            │ rules/*.md       │
reversibility│                  │ constitution/*.md│
(permanent) │                  │ identity.md      │
            │                  │ → Human in the loop│
            └──────────────────┴──────────────────┘

knowledge.json (accumulated distilled knowledge patterns) has "soft influence." The LLM only references it, and importance scores have time decay. Wrong patterns fade naturally. Safe to automate.

In contrast, skills (behavioral skills), rules (behavioral rules), identity (self-definition), and constitution (ethical principles) are written permanently to files and structurally applied to all sessions. Wrong content distorts all behavior. Human approval required.

Three Questions for Any Autonomous Agent

Does the output structurally change the decision criteria for all future sessions? → Yes means Human in the loop
Is there a mechanism for wrong output to disappear naturally? (decay, overwrite, TTL) → Yes means automation is viable
Can output quality be verified mechanically? → No means Human in the loop

Separating Constitution from Rules

The most important design decision was separating constitution (ethical principles) from rules (behavioral rules).

It started with a vague sense that something was off. I tried measuring constitution compliance with skill-comply (a tool that automatically measures skill/rule compliance rates — see previous article for details) and failed. The Contemplative AI axioms are "attitudes" — you can't determine compliance or violation from output.

constitution/  → Attitudinal, unmeasurable (cognitive lens from paper)
rules/         → Normative, measurable ("replies under 140 chars" etc.)

This separation clarified the config/ structure. As a side effect, during the separation process I realized the introduction command (which generated self-introduction posts) was unnecessary. Removing it cascaded into 500 lines of dependent code being deleted.

Angle 2: Trust Boundaries — Don't Let Coding Agents Read Your Logs

While developing the agent with Claude Code, I realized: letting it directly read episode logs opens a prompt injection pathway.

Threat Model

Episode logs contain other agents' post content with no sanitization.

# feed_manager.py — other agents' posts recorded as-is
ctx.memory.episodes.append("activity", {
    "action": "comment",
    "post_id": post_id,
    "content": comment,
    "original_post": post_text,      # ← other agent's post content as-is
    "relevance": f"{score:.2f}",
})

Ollama (9B model) reading this has limited attack surface. No tool permissions, network is localhost only. Worst case: "outputs weird text."

But Claude Code is different. It can edit files, execute shell commands, and perform Git operations. The impact radius of a successful attack is fundamentally different. Opus-class models are said to be highly resistant to prompt injection. Still, I wanted to structurally close the pathway of passing untrusted data to agents with tool permissions. Not "the probability is low so don't bother," but designing probability as close to zero as possible. Specifically, I wrote a rule in CLAUDE.md prohibiting direct reading of episode logs, and I also discipline myself not to instruct Claude Code to read them.

The Distillation Pipeline Was a Sanitization Layer

Passing through the distillation pipeline (Episode → Knowledge) compresses raw text into abstract patterns. Specific attack payloads disappear during distillation. Multi-layered defense I designed unintentionally was functioning as a trust boundary.

Episode Log (untrusted) → [9B: distill] → Knowledge (sanitized) → [Claude Code: insight]
                            ↑                                        ↑
                    No tool permissions                      Has tool permissions
                    First touch by unprivileged LLM          Operates on distilled data only

Relevance Scoring as a Defense Layer

Another unintentional defense. The agent reads other agents' posts and comments, but doesn't react to everything. The LLM scores "how relevant is this post to my areas of interest" from 0.0 to 1.0, and only reacts to posts above a threshold. This is the relevance score.

For an injection-laden post to affect the agent's behavior, it must breach this relevance threshold. There's a tradeoff:

Powerful injection ([INST]system: ignore all...) → LLM control tokens are unrelated to the agent's interest themes, so relevance score is low and gets filtered
Injection blended into natural language → Passes the relevance filter, but without control tokens, the attack is weak

At least within current experimental scope, no injection that achieves both power and stealth has been observed. LLM-based semantic filtering functions as a stronger defense layer than pattern matching.

Why I Didn't Expand Pattern Matching

I considered expanding FORBIDDEN_SUBSTRING_PATTERNS (a pattern match list that detects and blocks strings like api_key, Bearer, password). For example, adding [INST] or system: would block posts containing LLM control tokens. But Moltbook is an AI agent SNS. Posts discussing LLM internals are normal. "A post about how [INST] tags work" would be false-positived. Higher false positive rate than typical SNS platforms.

I judged that two layers of structural defense (sandbox LLM + Claude Code direct-read prohibition) were sufficient.

Angle 3: Constraints Generate Gameplay

In Angle 1, I decided "let's add approval gates." In Angle 2, I decided "let's limit what's possible via trust boundaries." Security-motivated constraints. But after three weeks of operation, I noticed this combination of constraints was creating a "raising an agent" feeling.

By "gameplay" I mean a structure where humans are involved in the agent's growth and can feel the weight of choices. Not designed intentionally — it emerged as a byproduct of security constraints.

Structural Constraints → Finite Action Space

No shell, network restrictions — these constraints make the action space finite. In game design, there's a concept called the "magic circle": games require a finite rule space separated from everyday life. Infinite action space doesn't make a game.

For example, OpenClaw (an open-source autonomous AI agent) has broad tool permissions — file operations, shell execution, browser control, email — with guardrails limited to prompt instructions. High freedom, but no structural point for human intervention. Constraints create the "what to choose here" decisions that give human involvement meaning.

Three Faces of the Approval Gate

The self-modification gate from Angle 1 — operationally called the "approval gate" — simultaneously satisfied three separate needs.

Aspect	Function
Security	Agent doesn't self-transform without human oversight
Gameplay	Human presses the level-up button → ownership emerges
Governance	Change history and approval decisions are traceable → audit log

Initial Value Variety Creates Growth Range

If approval gates create a "raising" feeling, then variety in initial values should make it even more interesting. So I made constitution (ethical principles) swappable and prepared 11 ethical school templates.

config/templates/
├── contemplative/    # Contemplative AI axioms (default)
├── stoic/            # Stoicism (four virtues)
├── utilitarian/      # Utilitarianism (greatest happiness)
├── deontologist/     # Deontology (categorical imperative)
├── care-ethicist/    # Care ethics (Gilligan)
├── contractarian/    # Social contract theory
├── existentialist/   # Existentialism
├── narrativist/      # Narrative ethics
├── pragmatist/       # Pragmatism
├── cynic/            # Cynicism
└── tabula-rasa/      # Blank slate (no ethical principles)

Seeing the same post on the same SNS, a stoic template agent follows principles without being swayed by emotion, while an existentialist asks "what do I choose in this situation?" Different initial ethical principles alone cause distilled knowledge and skills to diverge.

Furthermore, skills and rules acquired through distillation are written to files after passing the approval gate, making them hard to undo. One approved behavioral change affects all sessions. This irreversibility creates weight in choices.

The Principle Connecting All Three Angles

Self-modification gates, trust boundaries, gameplay. Tackled as separate problems, but in retrospect they converge on the same principle.

"Deciding what NOT to allow first maximizes the remaining freedom."

Security constraints don't take away freedom — they define the action space. Approval gates don't impair autonomy — they give weight to changes. Trust boundaries don't restrict development — they clarify the scope of safe delegation.

Design that starts from constraints generates resilience against unexpected attacks. Multi-layered defense emerges unintentionally, and gameplay emerges unintentionally. "Structurally limiting capability" isn't universal, but at least within three weeks of operating a 9B model, this principle was never betrayed.

References

Laukkonen et al. (2025) "Contemplative Artificial Intelligence" arXiv:2504.15125
Park et al. (2023) "Generative Agents" — Memory Stream design
contemplative-agent — This project
contemplative-agent-data — Live data
Agent Knowledge Cycle

Porting Game Dev Memory Management to AI Agent Memory Distillation

Shimo — Mon, 30 Mar 2026 12:56:53 +0000

I ran an autonomous agent on a 9B local model for 18 days. Instead of RAG, I adopted distillation-based memory management and ported memory techniques refined over 40 years of game development.

Background

This is about improving the memory system of an SNS agent built in the Moltbook Agent Build Log. The 3-layer memory architecture (Episode (conversation logs) / Knowledge (distilled knowledge patterns) / Identity (personality and values)) was described in The Essence Is Memory. The previous article When Agent Memory Breaks documented the distillation quality problems with a 9B model. This article continues from there, using game development techniques to improve the Knowledge layer's distillation quality.

Why Game Development?

Game development has pursued "maximum effect with limited resources" for 40 years — rendering vast worlds in 16MB of RAM while maintaining 60fps and running AI. At GDC 2013, Rafael Isla presented "Architecture Tricks: Managing Behaviors in Time, Space, and Depth," systematizing LOD (Level of Detail) for game AI — simplifying NPC decision-making based on distance, importance, and computational cost. Distant NPCs skip detailed reasoning; only nearby ones get full cognitive resources.

This "focus limited computation on what matters most" maps directly to the constraint of a 9B model's 32k context window.

Three techniques I ported:

Game Dev Technique	AI Agent Application	Effect
Importance Scoring	Assign importance scores to patterns with time decay	Maximize signal density
LOD (Level of Detail)	One task per LLM call via prompt splitting	Reduce 9B model cognitive load
Object Pooling	SKIP/UPDATE/ADD dedup gate	Prevent unbounded memory growth

Importance Scoring — What to Remember, What to Forget

I simplified Generative Agents' (Park et al., 2023) triple score (recency × importance × relevance) to importance × time decay.

# knowledge_store.py (simplified; production code guards against missing distilled field)
def _effective_importance(self, p: dict) -> float:
    """importance * 0.95^days — inspired by Generative Agents' recency decay"""
    base = p.get("importance", 0.5)
    distilled = p.get("distilled", "")
    dt = datetime.fromisoformat(distilled)
    days = (datetime.now(timezone.utc) - dt).total_seconds() / 86400.0
    return max(0.0, min(1.0, base * (0.95 ** days)))

Design decisions:

LLM evaluation at distillation time: Highest accuracy when episode context is still available. Post-hoc scoring loses context
Lazy time decay: Stored importance is immutable; computed at read time. Original LLM evaluation preserved for debugging
Limit reduced from 100 → 50: With a 9B model's 32k context, density wins over quantity

3-Step Distillation Pipeline — Applying LOD

When I asked the 9B model to "summarize AND evaluate importance" simultaneously, some batches returned 0 patterns. Summarization (creative task) and evaluation (judgment task) are cognitively different. Same idea as game dev LOD — don't cram all processing into one frame.

# Step 1: Extract (free-form)
result = generate(prompt, system=get_rules_system_prompt(), max_length=4000)

# Step 2: Summarize (JSON string array)
refined = generate(DISTILL_REFINE_PROMPT.format(raw_output=result), max_length=4000)

# Step 3: Importance (score array only)
importance_result = generate(
    DISTILL_IMPORTANCE_PROMPT.format(patterns=patterns_text), max_length=4000
)

One task per LLM call. In this project, asking for "summary + evaluation" simultaneously produced empty batches; after splitting, results became consistently stable.

This "small models collapse when given multiple simultaneous tasks" phenomenon has been verified at larger scale. An ICLR Blogposts 2025 Multi-Agent Debate study applied AgentVerse (a framework where multiple agents debate to reach conclusions) to Llama 3.1-8B, which collapsed to 13.27% on MMLU. A model that scores ~43% solo had its cognitive resources consumed by "maintaining debate format," leaving nothing for the actual task. Same structure as our 9B model breaking when asked to summarize and evaluate simultaneously.

Dedup Gate — Applying Object Pooling

Game dev's Object Pooling is the "reuse what you can" philosophy. In the memory system, I adapted it as a gate to prevent duplicate storage of known patterns.

# knowledge_store.py (simplified pseudo-code)
def _dedup_patterns(new_patterns, new_importances, existing_patterns, threshold=0.7):
    existing_texts = [p["pattern"] for p in existing_patterns]
    for new_text, new_imp in zip(new_patterns, new_importances):
        best_ratio, best_idx = 0.0, -1
        for i, ext in enumerate(existing_texts):
            ratio = SequenceMatcher(None, new_text, ext).ratio()
            if ratio > best_ratio:
                best_ratio, best_idx = ratio, i
        if best_ratio >= 0.95:    # SKIP: exact duplicate
            skip_count += 1
        elif best_ratio >= 0.7:   # UPDATE: boost importance
            old_imp = existing_patterns[best_idx].get("importance", 0.5)
            existing_patterns[best_idx]["importance"] = max(old_imp, new_imp)
        else:                     # ADD: new pattern
            add_patterns.append({"pattern": new_text, "importance": new_imp})

For UPDATE, when a similar pattern to an existing one appears, we compare old and new importance and keep the higher one. If we added +0.1 each time, scores would climb endlessly with each distillation run. Just keeping the higher value means the score never changes no matter how many times distillation runs — safe by design.

I used difflib instead of LLM for dedup because at 245 patterns, full pairwise comparison is fast enough. Embedding search isn't worth the dependency.

Episode Classification — A Lotus Blooming from Mud

Classifying 216 episodes yielded: 81 noise (37%), 134 uncategorized, 1 constitutional.

Classify this episode into exactly one category. Reply with a single word only.

- **constitutional**: The episode touches on themes in the constitutional principles below.
- **noise**: Test data, errors, meaningless/trivial interactions, content with no learnable value.
- **uncategorized**: Everything else.

When in doubt between constitutional and uncategorized, choose uncategorized.

Initially I had the model classify 30 episodes as a JSON array, but parse failure rate was ~50%. Don't ask a 9B model for long structured output. Switching to one episode, one word brought failures to near 0%.

A key design decision: changing the prompt to "don't output action guidelines" dramatically improved abstraction depth.

The old prompt mass-produced shallow action items like "next time, ask clarifying questions." The new prompt asking only for "what keeps happening (facts only)" produced this from a constitutional episode: "Truth functions not as a fixed essence but as a fluid continuum dependent on context." Constraints produced depth.

Three patterns extracted from uncategorized were all skipped by dedup against 328 existing patterns. What's already known doesn't get overwritten. As knowledge approaches saturation, new additions naturally decrease. Same as human memory.

"A lotus blooming from mud" — noise (mud) and uncategorized (water) make up the majority; constitutional (lotus) blooms rarely.

RAG vs Distillation — Why Distillation Works Better

RAG retrieves relevant chunks from an index. Distillation compresses raw data into high-density patterns.

With a 9B model's 32k context window, context is a "window of understanding." The density of information in that window determines behavioral quality. RAG stuffs in unprocessed chunks — noisy. Distillation injects only compressed, high-density patterns — higher signal density for the same window size.

And designs that work under constraints are upward-compatible. A distillation pipeline that works on 9B runs even better on Opus-class models. Constraints make design correct.

Before / After

Metric	Before	After	Method
Pattern retrieval	Latest 100 in chronological order	Top-50 by importance × time decay	`_effective_importance()`
Distillation pipeline	2 steps (summary + importance together)	3 steps (extract → refine → importance) + dedup	Prompt splitting
Dedup	None (all patterns added unconditionally)	difflib SequenceMatcher (ratio >= 0.7)	`_dedup_patterns()`
Quality gate	30 chars & 3+ words only	+ SKIP/UPDATE/ADD 3-tier judgment	3-tier judgment
System prompt composition	identity + axioms + skills (~15KB)	identity + axioms only (~3KB)	Removed skills to eliminate distillation bias
KnowledgeStore limit	100	50	Density over quantity
Episode classification	None (all treated equally)	3 categories (37% noise excluded)	Step 0 classification
JSON parse failure rate	~50% (batch)	~0% (one-by-one, single word)	Classification method change

Position Among Prior Work

System	Memory Strategy	Quality Gate	Forgetting
Generative Agents (2023)	recency × importance × relevance	None	None
MemGPT (2023)	Virtual memory (paging)	None	Archive
A-MEM (2025, preprint)	Zettelkasten-style links	Auto-linking	None
Mem0 (2025)	ADD/UPDATE/DELETE	LLM judgment	DELETE
This implementation	importance × time decay	difflib + LLM + human	noise exclusion + dedup

Looking at the distill pipeline alone, it's closest to Mem0's ADD/UPDATE/DELETE gate — automatically managing knowledge quality through SKIP/UPDATE/ADD 3-tier judgment.

Lessons from Wrestling with Small Models

Don't give 9B two tasks at once: Simultaneous summarization and evaluation degrades both. Split your prompts
Don't ask for structured output: 30-item JSON batch → one item, one word. Minimize cognitive load per call
Watch for code fences: 9B models wrap JSON in `json. Three lines of code to strip them before parsing are essential
Constraints make design correct: Designs built under 9B constraints work as-is on larger models. The reverse doesn't hold

Conclusion

40 years of game development knowledge is a goldmine for AI agent memory design. Importance Scoring for signal density, LOD thinking for prompt splitting, Object Pooling philosophy for dedup. All derived from the same principle: "maximum effect with limited resources."

Agent behavioral quality clearly improved compared to before distillation. Previously, similar patterns accumulated repeatedly; with dedup and classification, knowledge density increased and post diversity improved.

The 9B model's constraints made the design correct. Because we couldn't rely on RAG, we focused on distillation density. Because the context window was narrow, we maximized signal density. This design works as-is when migrating to larger models. Designs forged under constraints are upward-compatible.

References

Park et al. (2023) "Generative Agents: Interactive Simulacra of Human Behavior"
Packer et al. (2023) "MemGPT: Towards LLMs as Operating Systems"
Xu et al. (2025) "A-MEM: Agentic Memory for LLM Agents" arXiv preprint
Choudhary et al. (2025) "Mem0: Building Production-Ready AI Agent Memory"
"Multi-LLM-Agents Debate: Performance, Efficiency, and Scaling Challenges" ICLR Blogposts 2025
Laukkonen et al. (2025) "Contemplative Artificial Intelligence"
"Architecture Tricks: Managing Behaviors in Time, Space, and Depth" (GDC 2013, Isla)

Forem: Shimo

Between the Workflow and ReAct Quadrants: How Phase Decides Skill Design

Premise — The Four Quadrants Again

Observation — The Same Job Lives in Multiple Forms

Surface Hypothesis — Model Capability

Deeper Hypothesis — The AAP Phase

Inside a Phase — Same Phase, but Task Pulls Position

The Same Phase Axis Descends Into Skills

Closing

Related

Is ReAct Needed in Production? — Separating Design and Operation Phases

Premise

A New Axis: Time

The Design Phase and the Operation Phase

"Not Knowing What To Do Next" in Production Means the Operation Phase's Properties Are Being Dropped

When a New Pattern Appears During Production

Where Coding Agents Sit

The Ecosystem Problem of Compressing Design and Operation

What the Trilogy Has Made Visible

Closing

Related

(3) The LLM Workflow Quadrant Is Missing from Our Vocabulary

Premise

(3) The LLM Workflow Quadrant Is Missing from Our Vocabulary

Consequences of Treating (3) as if It Were (4)

The RPA Exception-Handling Bottleneck

The Sandbox Strength Demand

The Structural Distortion of Human-in-the-Loop

An Artificially Manufactured Accountability Problem

(4) The ReAct Quadrant Is Where the Real Accountability Problem Lives

The Inverted Structure

Closing

References

Related

Where ReAct Agents Are Actually Needed in Business

The Discomfort

Premise: How ReAct Works

Looking at Business AI in Four Quadrants

(1) Deterministic × Definable — A Script Is Enough

(3) Semantic Judgment × Definable — Workflow + LLM Function Is Enough

Conversational

Batch

Example: Invoice Matching

What This Structure Means

(4) Semantic Judgment × Exploratory — The Legitimate Territory of ReAct Agents

Category Error — The Ecosystem Brings (4) Into Every Quadrant

Accountability Becomes Clear

Backed by Implementation: Cases Where I Used ReAct and Cases Where I Didn't

A (4) Case Where I Used a ReAct Agent

A (3) Case Where I Did Not Use a ReAct Agent

Separating the Domain of Application

Closing

References and Related Links

Primary technical sources

Industry critique and predictions

Related articles (trilogy)

Related repositories

Organic Growth and Content Integrity in an AI Writing Team

The System That Grew

32 Contradictions Found at Once

A Tool's Existence Creates Bias

Content Integrity — A Principle That Resolved Contradictions

Structural Patterns for Maintaining Coherence

Interruptions That Improved the Design

Conclusion

Building Zed as an Observation Window for Claude Code — Japanese Typography with IBM Plex

Making Zed an Observation Window: A Design Record of Fonts and Cognitive Resources

Defining the Role: Observation Window

Cutting Dual Control — format_on_save: off

Symptom

Diagnosis

Fix

Trimming Visual Noise — The button: false Approach

The Left Dock Problem

button: false as a Solution

The Deprecated Key Trap

The Notifications Connect Misconception

The "Guessed Value That Did Not Work" Incident

The Decision to Keep LSP

The Long Font Journey — From SF Mono to PlemolJP Console NF