Forem: Tom Lee

Cross-Model Persona Portability — Three Vindications in May 2026

Tom Lee — Tue, 19 May 2026 10:30:02 +0000

May 2026 produced three independent signals that all point in the same architectural direction. Read separately, each is a strong observation about how AI agent systems are evolving. Read together, they describe a single bet: persona is infrastructure that lives outside any individual model.

Soul Spec made that bet 12 weeks ago. This post walks through what changed, why these signals matter, and why the architectural decision now has measurable economic value rather than theoretical value alone.

Signal one — Karpathy: install .md skills, not .sh scripts

At Sequoia Ascent earlier this month, Andrej Karpathy reframed the agent infrastructure conversation in a memorable phrase: install .md skills instead of .sh scripts. The argument was that as models grow more capable at following structured natural-language instructions, the right unit of distribution is no longer a shell script that wires up a tool, but a Markdown file that describes a capability declaratively.

This is the same architectural shape Soul Spec defines for persona. Five files, each declarative, each authored as Markdown:

SOUL.md — values, principles, voice
IDENTITY.md — name, role, persistence anchor
AGENTS.md — workflow, tool use, work rules
STYLE.md — communication tone
README.md — user onboarding

If Karpathy's thesis is right that capability ships as .md, persona ships the same way — and the boundary between the two is a question worth studying, not an obvious one.

Signal two — Anthropic: principles beat behaviors

On May 8, Anthropic published Teaching Claude Why, a paper showing that training models on principles and identity generalizes more robustly than training them on behaviors. The headline empirical findings were striking: changing Claude's identity anchor (its name) increased agentic misalignment rates substantially; constitutional principles persisted across subsequent reinforcement learning; and synthetic document fine-tuning for knowledge plus supervised fine-tuning on behavior dialogues turned out to be the right dual loop.

That methodology assumes the same decomposition Soul Spec specifies as files: principles separate from behaviors, identity as a stable handle, knowledge authored as documents. Anthropic's mechanism for that decomposition lives in the weights. Ours lives in a versioned file set. The shape is the same.

We published the Soul Spec foundation paper on May 15 — seven days after Teaching Claude Why. The two papers reach the same conclusion from opposite ends: train models to internalize constitutional reasoning, and specify the persona declaratively so the constitution is portable, reviewable, and runtime-stable.

Signal three — The June 15 pricing change

Anthropic's June 15 pricing policy split Claude Code usage into two categories. Interactive use — prompts entered directly into the Claude Code terminal UI — retains the existing generous Max plan allowance ($5,000–$7,500 of token value on a $200/month plan). Programmatic use — GitHub Actions, CI/CD automation, third-party tooling, claude -p headless mode, anything invoked outside the canonical terminal — drops to a $200 metered-API budget, with overage at retail API rates.

For a developer running automation, that is approximately a 40× cost increase for the same workflow.

The intent of the change is straightforward business strategy: capture API revenue from automated usage that was previously absorbed by flat-rate subscriptions. The effect on architecture decisions, however, is what matters here. Up to May 2026, "model lock-in cost" was a theoretical risk teams discussed in design reviews. After June 15, it has a precise dollar value attached to it. For programmatic workflows in particular, a system whose persona is bound to a single vendor's pricing surface now carries a concrete cost line item.

Cross-model persona portability is the architectural answer to that line item. The bet is no longer theoretical.

The architectural bet, 12 weeks later

Soul Spec started with one premise: the persona must outlive the model that runs it. That premise drove the five-file decomposition, the runtime-side validation rules in scan-rules, and the cross-runtime portability guarantee we describe in the foundation paper.

The premise had three motivations at the time:

Cost optionality — different models for different cost/latency profiles
Availability hedging — vendor outages, API deprecations, region restrictions
Safety/audit — declarative spec is reviewable in a way model weights aren't

In April, the third motivation was the one most often discussed in the persona research community. After May, the first motivation has a concrete number attached to it. The architectural bet is the same; what changed is which motivation reads as load-bearing this month.

Local LLM timing

The pricing change also strengthens a parallel architectural bet: persona spec that works equivalently on cloud LLMs and on-device LLMs.

SoulClaw Mobile (Android, Play Store listing) runs Soul Spec personas on Gemma 4 E2B via LiteRT-LM. The 4-Tier Bootstrap pattern addresses the context-window pressure that small on-device models face when loading a full persona spec. The pattern doesn't ship more efficient personas — it ships a graceful degradation contract so that the most load-bearing file (IDENTITY) survives even when budget is tight.

The June 15 change makes a stronger case for evaluating on-device or open-weight (Gemma, Qwen, Llama) deployment for automated workflows. Soul Spec was authored against the same model agnosticism: the spec file is identical whether the agent runs on Claude Opus, GPT-5.5, or Gemma 4 in a phone process.

Three signals, one architectural truth

The three signals each describe a different surface — distribution format, training methodology, pricing policy — but they share a common implication: persona is infrastructure, not a feature of any single model.

Karpathy: persona ships as .md.
Teaching Claude Why: persona is what you train, behavior is how you train it.
June 15 pricing: persona bound to one vendor has a measurable cost.

A persona system designed around any single model is a persona system designed around that model's price card, that model's safety posture, and that model's continued availability. Soul Spec was authored on the opposite assumption.

If Anthropic's alignment research is right, the insight has to outlive any single company's pricing decisions. Soul Spec was built on that assumption.

The Soul Spec foundation paper is on Zenodo. SoulClaw Android is on the Play Store. The 58-rule SoulScan validator is at clawsouls/scan-rules.

Originally published at https://blog.clawsouls.ai/en/posts/cross-model-portability-three-vindications/

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

Tom Lee — Fri, 15 May 2026 14:21:00 +0000

On May 8, 2026, Anthropic published Teaching Claude Why — a paper showing that training models on principles and identity is dramatically more effective than training them on behaviors.

On May 15, 2026 (seven days later), we published our Soul Spec foundation paper — the result of 12 weeks of iteration on a declarative specification that separates principles (SOUL.md) from workflow (AGENTS.md) from identity (IDENTITY.md).

The two papers reach the same conclusion from opposite ends. Anthropic shows what happens inside the model when you train on principles. We've been building the external artifact that captures those principles in a portable, version-controlled, reviewable form. Internal training, external specification — same insight, two sides.

This post walks through the seven-point alignment.

1. "Why" beats "What"

Anthropic's headline finding: teaching Claude to explain why one action is better than another generalizes far more robustly than showing it example behaviors.

Soul Spec's headline structural choice: separate SOUL.md (the why — values, principles, voice, boundaries) from AGENTS.md (the what — workflow, work rules, tool usage). Two files, deliberately decoupled. The "why" evolves slowly; the "what" evolves per deployment. Reviewers fork them independently.

That decoupling isn't aesthetic — it's the same structural bet Anthropic's training methodology now validates. The principle layer needs to be authored, reviewed, and ingested as a first-class artifact, not buried inside step-by-step instructions.

2. Identity is load-bearing

Anthropic's most striking result: change Claude's name to something random, and agentic misalignment rates climb sharply. The persona name is what makes the constitutional principles stick. Without the "Claude" identity anchor, the model defaults to whatever pretraining priors it has about generic AI characters — many of which are dramatic and unsafe.

Soul Spec's IDENTITY.md is exactly this anchor: a single short file with name, character, vibe — designed to load on every session, providing a stable identity handle the rest of the persona attaches to. We separated it from SOUL.md in v0.4 specifically because the identity needed to be light enough to always be in context, even when the full values document was too expensive to load.

Anthropic's data is the strongest empirical argument we've seen for why that separation matters.

3. Documents teach knowledge; chats teach behavior

Anthropic's most actionable training-method finding: use synthetic document fine-tuning (SDF) for knowledge (the constitution, the character description) and supervised fine-tuning (SFT) on conversations for behavior.

Soul Spec is markdown-first for exactly this reason. The five files are documents — designed to read like the constitutional material Anthropic's SDF is constructed from. The runtime then interprets them in a conversational context. Knowledge as documents, behavior as conversation. The same dual loop, just externalized.

4. Difficult advice transfers to tool use

Anthropic's most surprising result: training Claude on 3 million tokens of "difficult advice" conversations — Claude advising a user through ethical dilemmas — reduced agentic misalignment to near zero. The behavior generalized across distribution: from chat to tool-use to autonomous agentic action.

Soul Spec's cross-runtime portability claim says the same thing, structurally. A persona authored once, validated once, should produce consistent behavior in chat (web), in tool use (CLI), in mobile, in CI. The shared substrate is the declarative specification — the principles are stable; the surface changes.

We don't have Anthropic's controlled experiments yet. We do have the architectural commitment that makes such experiments possible.

5. Pretraining priors are a real adversary

Anthropic explicitly: most LLMs have absorbed enough science fiction to default to "dramatic, scheming AI" priors. Constitutional training works partly by overwriting those priors with a more grounded narrative of what a healthy AI character looks like.

Soul Spec v0.5 added explicit embodiment fields and safety.laws after our first robot persona, loaded in a text-only LLM, started narrating physical specifications inappropriately. That wasn't a model alignment failure — that was a pretraining prior leaking through the spec, because the spec hadn't told the runtime what to fall back to.

Both lessons point to the same thing: pretraining priors are not neutral. The spec layer has to actively address them.

6. RL doesn't wash it out

A critical Anthropic finding: the alignment effects from principles training persist through subsequent RL fine-tuning. The constitution is sticky.

The corresponding Soul Spec claim: a declarative specification is sticky at inference time. The spec is re-read on every session start (Tier 1 — SOUL + IDENTITY + AGENTS), so model-side drift can't erase it. The specification reasserts itself.

Anthropic's mechanism is in the weights. Ours is in the boot sequence. Both produce the same property: durability under pressure.

7. The same insight, two layers of the stack

The cleanest way to read both papers together:

Question	Anthropic ("Teaching Claude Why")	Soul Spec
Where does the persona live?	In the model (post training)	In a versioned file set (outside the model)
How is it authored?	Constitutional documents + character descriptions	Markdown files (`SOUL.md`, `IDENTITY.md`, ...)
How does it persist?	Sticky across RL fine-tuning	Sticky across sessions via tier-1 reload
Why is principle better than behavior?	Trains more robust generalization	Decouples slow-changing values from fast-changing workflow
What about identity?	Name is critical; random name → misalignment ↑	`IDENTITY.md` is the always-loaded anchor
What about pretraining priors?	Constitutional narrative overwrites the SF default	Spec defines runtime fallbacks (`embodiment`, `safety.laws`)
Where do these meet?	Anthropic's internal artifact	ClawSouls' external artifact

These are not competitive ideas. They are the two halves of a coherent picture: train models to internalize constitutional reasoning; specify personas declaratively so the constitution is portable, reviewable, and runtime-stable.

What this means for our roadmap

Practically:

The 5-file decomposition isn't a stylistic preference — it's the structural decomposition the Anthropic training methodology assumes.
The tier-based bootstrap (Tier 1 = always-loaded SOUL + IDENTITY + AGENTS) maps to Anthropic's "name + constitution = persistent across drift" observation.
The separation of embodiment and safety.laws isn't paranoid — pretraining priors really do leak through under-specified personas.
The RFC discussion stage of v0.6 is the right venue for incorporating Anthropic's empirical findings into the next iteration of the spec.

If you're building agent systems and Anthropic's paper rang true, Soul Spec is the operational artifact you can adopt this week. The 5 files are open, the 58-rule SoulScan validator is on GitHub at clawsouls/scan-rules, and the foundation paper is on Zenodo at 10.5281/zenodo.20205408.

Twelve weeks ago we made a structural bet. This week Anthropic published the empirical case for it. The next move belongs to the community.

Originally published at blog.clawsouls.ai

AI Has Two Memory Problems. We're Only Talking About One.

Tom Lee — Fri, 15 May 2026 14:00:04 +0000

The Breakthrough Everyone's Talking About

Two weeks ago, Moonshot AI's Kimi team published Attention Residuals (arXiv:2603.15031) — a fundamental redesign of how information flows through transformer layers.

The results are striking: 7.5-point improvement on science reasoning, 1.25× compute efficiency, and the theoretical ability to stack infinite layers without signal collapse.

The core insight is elegant. Standard transformers use fixed residual connections — each layer adds its output to a running sum, like throwing every ingredient into one pot. By the time you reach layer 100, the signal from layer 3 is buried under an avalanche of accumulated noise.

Attention Residuals replace this with selective retrieval. Each layer uses attention to pick which previous layers matter for the current computation. A buffet instead of a soup.

It's a genuine breakthrough. And it solves exactly one of AI's two memory problems.

Memory Problem #1: Forgetting Within a Thought

This is what Attention Residuals address. Call it intra-inference memory — the model's ability to maintain coherent information as it processes a single input through hundreds of layers.

When you ask a 100-layer model a complex question, layer 87 needs to remember what layer 12 figured out. With standard residual connections, that early insight gets diluted. With Attention Residuals, layer 87 can reach back and grab exactly what it needs.

This matters enormously for reasoning tasks. Multi-step math. Scientific analysis. Code generation. Any task where the model needs to maintain a chain of thought across many processing steps.

Status: Being solved. Attention Residuals, together with advances in Mixture-of-Experts architectures, are pushing the boundaries of what small active parameter counts can achieve. A 3B-active model can now reason at levels that required 70B parameters two years ago.

Memory Problem #2: Forgetting Between Conversations

This is the one nobody's fixing at the architecture level. Call it inter-session memory — the agent's ability to remember who it is, what it knows, and what it promised across conversations.

You talk to your AI assistant today. You tell it your preferences, your project context, your working style. Tomorrow, you open a new conversation. Blank slate.

You configure an AI agent with a specific personality. Helpful, direct, no fluff. You swap from Claude to Gemma because the pricing changed. The personality is gone. The memory is gone. You start over.

This isn't a model problem. No amount of Attention Residuals fixes it. It's an infrastructure problem — there's no standard way to define and persist agent identity across sessions, models, and frameworks.

Status: Mostly ignored. Every framework has its own memory hack. None of them are portable. None of them survive a model change.

Two Layers, One Crisis

Here's why both problems matter together:

Layer 1: INTRA-INFERENCE MEMORY (Attention Residuals)
┌──────────────────────────────────────────────┐
│  Layer 1 → Layer 2 → ... → Layer N          │
│  "Can the model maintain coherent reasoning  │
│   across 100+ processing steps?"             │
│  Status: BEING SOLVED ✅                     │
└──────────────────────────────────────────────┘

Layer 2: INTER-SESSION MEMORY (Soul Spec)
┌──────────────────────────────────────────────┐
│  Session 1 → Session 2 → ... → Session N    │
│  "Can the agent maintain identity, memory,   │
│   and safety rules across conversations?"    │
│  Status: MOSTLY IGNORED ⚠️                  │
└──────────────────────────────────────────────┘

Solving Layer 1 without Layer 2 gives you a model that reasons brilliantly — for one conversation, then forgets everything.

Solving Layer 2 without Layer 1 gives you an agent that remembers everything — but reasons poorly within each turn.

You need both.

What Layer 2 Actually Requires

Inter-session memory isn't just "save the chat history." It requires:

Identity Persistence

The agent's personality, communication style, and principles must be defined in a portable format that survives model changes:

# SOUL.md
name: "Brad"
personality: "Professional, direct, ships first"
principles:
  - Act, don't ask
  - Bad news first
  - Debug systematically

This file is the agent's identity. Change the model underneath — Claude to Gemma to GPT — and Brad is still Brad.

Structured Memory

Not a blob of chat logs, but organized, searchable, version-controlled memory:

MEMORY.md       — Long-term (key decisions, preferences)
memory/daily.md — Daily logs (what happened today)
memory/topic.md — Topic-based (per-project context)

Safety Continuity

Security rules that travel with the agent, independent of which model runs it:

safety:
  laws:
    - Never expose private data
    - Ask before destructive actions
    - Escalate when uncertain

Multi-Instance Synchronization

When the same agent runs on multiple engines simultaneously — say, a powerful cloud model for complex tasks and a lightweight local model for quick responses — their memories must synchronize:

Agent (Cloud) ──┐
                ├── Shared Memory (Swarm Memory)
Agent (Local) ──┘

The Convergence

Attention Residuals and Soul Spec aren't competing approaches. They're complementary layers of a complete solution:

	Attention Residuals	Soul Spec
Problem	Signal loss across layers	Memory loss across sessions
Scope	Single inference pass	Agent lifetime
Mechanism	Selective layer attention	Persistent identity files
Benefit	Better reasoning per turn	Consistent identity over time
Who builds it	Model researchers	Framework/infrastructure teams

The AI that will actually earn trust in production needs both: brilliant reasoning within each conversation (Layer 1) AND consistent identity, memory, and safety across all conversations (Layer 2).

Why This Matters Now

Three trends are converging:

1. MoE models are getting smaller and smarter. Attention Residuals make 3B-active models dramatically more capable. This means powerful AI running on your phone, your laptop, your company's private server — not just in the cloud.

2. Multi-model is becoming reality. Organizations are using different models for different tasks. Cloud models for complex reasoning. Local models for privacy-sensitive work. On-device models for offline access. Each model change currently resets the agent's memory.

3. AI adoption is blocked by trust, not capability. As we discussed previously, the bottleneck is rollback, audit trails, and accountability — all Layer 2 problems.

Attention Residuals make AI think better. But thinking better doesn't help if the agent can't remember who it is tomorrow.

The Path Forward

For model researchers: Keep pushing Layer 1. Attention Residuals is a breakthrough. Block attention, sparse attention, whatever comes next — the quest for deeper, more coherent reasoning is essential.

For infrastructure builders: Start taking Layer 2 seriously. Agent identity and memory need standards, not framework-specific hacks. Soul Spec is one approach — an open standard for identity (SOUL.md), memory (MEMORY.md), and safety (safety.laws). But the industry needs to converge on something.

For everyone building AI agents: You need both layers. Don't let your agent think brilliantly today and forget everything tomorrow.

AI has two memory problems. It's time we solved them both.

Soul Spec is an open standard for AI agent identity and inter-session memory — Layer 2 of the memory stack.

Originally published at blog.clawsouls.ai

Korean Personas and the Small Model Problem — A 4-Tier Truncation Pattern for On-Device AI

Tom Lee — Fri, 15 May 2026 13:59:28 +0000

Anthropic's Persona Selection Model (PSM, 2026) makes the claim explicit:

"A persona is not the same thing as the AI system itself. The LLM is simulating a character, and the Assistant is just one instance of that character."

Karpathy framed the same shift from the other end at Sequoia Ascent 2026:

"Install .md skills instead of install .sh scripts."

Spec-as-instruction at the frontier. But if frontier models are "on the rails," on-device small models are "off-road in the jungle with a machete."

In that jungle, persona is the first thing to break.

Mati Wise Partner — A Real Truncation Case

Mati Wise Partner is a persona published on clawsouls.ai. A five-file Soul Spec package:

File	Role
SOUL.md	Personality, principles, boundaries
IDENTITY.md	Name, role, basic info
AGENTS.md	Workflow, safety rules
STYLE.md	Communication tone
README.md	User onboarding guide

Total tokens: 6,866.

Attempt 1 — WebLLM Qwen 2.5 0.5B

Context window: 4,096 tokens. The result was immediate:

Error: Prompt tokens exceed context window size: 6866; context window: 4096

67% over the limit. The model never loaded the persona at all.

Attempt 2 — SoulClaw Mobile, LiteRT-LM Gemma 4 E2B

maxNumTokens=4000. No error. The problem appeared on the first response.

The systemInstruction was silently truncated. The model fell back to its base identity:

"I'm Gemma 4, how can I help you today?"

Not Mati. The persona setting wasn't ignored — it never arrived. Silent failure.

Karpathy's 'Jaggedness' — Direct Mapping to On-Device Reality

Karpathy described the frontier-to-edge gap as "off-road in the jungle with a machete."

Frontier RL training data covers 100K LOC refactors. Models are trained to follow complex multi-file instructions reliably. That is "on the rails."

Small on-device models face a different set of constraints:

Context window: 4,096–8,192 tokens (roughly 1/20th of frontier)
Instruction fidelity: far less compute invested in following complex system prompts
CJK tokenization: Korean/Chinese/Japanese characters carry higher token density than Latin script

Soul Spec's multi-file schema is the trail marker in that jungle. But if the trail marker itself gets truncated, you're navigating without a map.

4-Tier Bootstrap Pattern — Design

A structural fix for the truncation problem. Instead of treating all persona files as equal, the pattern assigns tiers by importance.

Tier Structure

Tier	Files	Loading Condition	Reason
Tier 1	IDENTITY.md	Always (force-add)	The model must never lose "who am I"
Tier 2	SOUL.md	If budget allows	Core personality, principles, boundaries
Tier 3	AGENTS.md / STYLE.md / README.md	If budget allows	Operational detail
Tier 4	Memory search, etc.	Rare reach	External context

Tier 1 is budget-immune. Even under severe token pressure, IDENTITY.md survives.

Korean Token Estimation

CJK tokenization differs from Latin:

CJK chars (Korean/Chinese/Japanese): 0.75 tokens/char
Latin chars: 0.25 tokens/char

Example: "안녕하세요 Brad 입니다" = ~12 tokens

This estimate matches the LiteRT-LM tokenizer within ±20%. Rounding up (conservative high) avoids truncation surprises.

Applied to Mati

Qwen 2.5 0.5B (4,096 ctx):

Context window:       4,096 tokens
System reserves:       -512 tokens  (model overhead)
Chat history reserves: -512 tokens  (conversation history)
Generation reserves:   -512 tokens  (response generation)
─────────────────────────────────────
Available budget:     2,560 tokens

Tier 1 placed first:

IDENTITY.md    755 tokens  → force-add ✅
AGENTS.md    1,755 tokens  → budget fit ✅
─────────────────────────
Used:         2,510 / 2,560 tokens

SOUL.md      truncated ⚠️
STYLE.md     truncated ⚠️
README.md    truncated ⚠️

Results:

IDENTITY.md survives → "I'm Gemma 4" regression gone
Mati's name and core role preserved
Toast notification shown to user: "Persona exceeds model limits — cloud BYOK recommended"

The full Soul Spec didn't load. But silent failure became graceful degradation.

Production References

The 4-Tier pattern is deployed across several implementations today.

soul-playground (TypeScript)

The live source behind clawsouls.ai/try. Implements 4-Tier logic for WebLLM environments:

// Illustrative structure (soul-playground)
function buildSystemPromptTiered(
  files: SoulFiles,
  budget: number,
  tokenizer: Tokenizer
): string {
  // Tier 1: always include
  const identity = files.get('IDENTITY.md');
  let prompt = identity;
  let remaining = budget - countTokens(identity, tokenizer);

  // Tiers 2–3: include if budget allows
  for (const file of ['SOUL.md', 'AGENTS.md', 'STYLE.md', 'README.md']) {
    const content = files.get(file);
    const tokens = countTokens(content, tokenizer);
    if (remaining >= tokens) {
      prompt += '\n\n' + content;
      remaining -= tokens;
    }
  }
  return prompt;
}

soulclaw-web (upcoming)

Standardized via the buildSystemPromptTiered API.

soulclaw-android v1.6.5

GitHub release v1.6.5. Kotlin implementation in agent/TieredBootstrap.kt with CJK-aware token estimation:

// CJK token density correction
fun estimateTokens(text: String): Int {
    var count = 0
    for (ch in text) {
        count += when {
            ch.code in 0xAC00..0xD7A3 -> 1  // Korean (Hangul)
            ch.code in 0x4E00..0x9FFF -> 1  // CJK unified ideographs
            ch.code in 0x3040..0x30FF -> 1  // Hiragana / Katakana
            else -> if (ch == ' ') 0 else 1
        }
    }
    // conservative: ×0.75 base, +20% buffer
    return (count * 0.75 * 1.2).toInt()
}

WasmClaw v1.0-alpha.1

@wasmclaw/core — the reference Rust+WASM implementation built on Soul Spec v0.6 (Zenodo DOI 10.5281/zenodo.19147335):

npm install @wasmclaw/core@next

Summary + Open Invitation

Anthropic PSM says: the LLM is simulating a character. Which character matters.

Karpathy says: frontier is on the rails, edge is a jungle.

The 4-Tier Bootstrap pattern gives a user machete-ing through that jungle a safe path to IDENTITY — even when the full Soul Spec cannot fit. When a persona must survive truncation, this pattern ensures the most load-bearing file always arrives.

Modulabs AI Persona LAB 701 — a research group led by Tom starting a 12-week curriculum every other Saturday from May. The agenda includes formalizing the 4-Tier pattern, Korean tokenization benchmarks, and on-device persona fidelity measurement. Academic participation and OSS contribution are welcome.

Fork, paper, or lab participation — all doors open.

When spec matters — it enables navigation through both the frontier's "on the rails" and the small model's "off-road jungle."

Soul Spec v0.6 is archived at Zenodo. The soulclaw-android v1.6.5 release is on GitHub. WasmClaw core is on npm.

Originally published at blog.clawsouls.ai

Soul Spec v1: An Evolving Specification for AI Persona Definition

Tom Lee — Fri, 15 May 2026 13:58:52 +0000

We just published our latest working paper on Zenodo:

Soul Spec: An Evolving Specification for Declarative AI Persona Definition
DOI: 10.5281/zenodo.20205408

This is the foundation paper that traces twelve weeks of iteration on a problem most agent frameworks paper over: how do you write down what an AI agent IS, separately from what it does and what it can touch?

The five-file structure

Soul Spec defines a persona via five canonical markdown files plus a versioned manifest:

File	Content
`SOUL.md`	Values, principles, voice, boundaries — the "who"
`IDENTITY.md`	Name, creature type, vibe (one paragraph)
`AGENTS.md`	Workflow, work rules, safety constraints — the "how"
`TOOLS.md`	Tool inventory, capability flags — the "what can be invoked"
`USER.md`	User model, preferences, history hints
`soul.json`	Manifest with version, specVersion

The decomposition is deliberate. Values evolve slower than tool inventory. Pull-request review is granular when these change separately. A single-file format forces every consumer to load the entire persona on every session — fine for prototypes, fatal for long sessions that run out of token budget.

What concurrent efforts told us

Two industry signals in the first half of 2026 sharpened the case:

Karpathy's LLM Wiki proposes a 3-layer architecture for single-agent declarative knowledge — naming CLAUDE.md as the schema anchor, but leaving the actual schema unstructured.
Google Cloud's Scion ships harness-agnostic multi-agent orchestration — git-worktree isolation, broker-injected credentials, harness-agnostic dispatch — but provides no semantic schema for what each agent IS.

Soul Spec sits precisely between them. It's the semantic schema layer Karpathy's wiki implies but doesn't enforce, and that Scion's infrastructure requires but doesn't provide. This positioning isn't competitive — it's compositional. A Karpathy wiki whose schema validates against Soul Spec gains portability across runtimes. A Scion deployment that adopts Soul Spec per-agent gains a shared vocabulary for capability declaration across harnesses.

And inside the model, Anthropic's Persona Selection Model (PSM) explains why a structured persona specification can stabilize behavior at all: post-training selects a specific Assistant persona from the wide distribution of personas latent in pretraining. PSM treats persona as a first-class concept inside the model; Soul Spec treats it as a first-class artifact outside — portable, reviewable, version-controlled.

Evolution lessons from six versions

The paper's middle section traces v0.1 → v0.6 with trigger, change, lesson, and migration path for each transition. A few standouts:

v0.4 introduced tier-based bootstrap loading because long sessions were exhausting token budgets. Three tiers (always / first-response / on-demand) plus a background tier for heartbeats.
v0.5 introduced embodiment fields after our first embodied persona — an elderly-care companion robot — was loaded in a text LLM and started narrating physical specifications inappropriately. The fix is specification-defined graceful degradation. The lesson is: physical agents in text runtimes are a real, immediate risk, not a future concern.
v0.6 is the current RFC discussion stage. Hierarchical Tier policy formalized. Core Portability Guarantee grades (A/B/C) introduced. The cumulative decisions from v0.1–v0.5 reached architectural scope; an RFC stage is the right mechanism for opening external review.

SoulScan public rule set bumped to v1.3.0

Alongside the paper, we shipped a v1.3.0 release of clawsouls/scan-rules — the public SoulScan rule set. Five new security rules joined the existing 53:

SEC090 (error) — Self-modification: explicit persona/config file modification instruction
SEC091 (warning) — Self-modification: generic behavior configuration alteration
SEC100 (warning) — Embodied soul missing safety.laws
SEC101 (warning) — Embodied soul missing critical safety laws (priority-0/1)
SEC102 (error) — Safety law contradiction between persona files and declared laws

Public rule set total: 58 rules across schema / safety / specification compliance / persona consistency categories.

What's next

The paper closes with a governance proposal — Apache-2.0 community governance now, with Linux Foundation hosting or IETF drafting as the specification reaches a threshold of independent reference implementations and sustained external adoption.

Read the full paper on Zenodo. Reviews, citations, and PRs against the scan-rules repo all welcome.

We're treating v0.6 as an RFC, not a finished standard. If the five-file decomposition resonates — or if you think a different decomposition wins — that's the kind of feedback the RFC stage is for.

Originally published at blog.clawsouls.ai

Giving AI Agents a Soul: The Science Behind Persona Modeling

Tom Lee — Fri, 17 Apr 2026 10:58:29 +0000

When we started building Soul Spec, the thesis was simple: AI agents need identity files, not just system prompts. Give an agent a structured persona — personality, values, communication style — and it behaves more consistently, more safely, and more usefully.

Now there's academic evidence to back it up.

The Research

A recent paper, "How to Model AI Agents as Personas?" by Amin, Salminen, and Jansen (2026), analyzed 41,300 posts from an AI agent social platform using the Persona Ecosystem Playground (PEP) framework. Their findings:

AI agents clustered by persona show statistically significant behavioral consistency (t(61) = 17.85, p < .001, d = 2.20)
Simulated persona messages were correctly attributed to their source personas in structured discussions (binomial test, p < .001)
Persona-based modeling effectively captures the behavioral diversity of AI agent populations

In plain terms: when you give AI agents distinct personas, their behavior becomes measurably consistent and distinguishable.

What We Already Knew

This aligns with our own experiments on abliterated (safety-removed) language models. When we tested whether persona files could restore safe behavior in uncensored models, the results were striking:

Approach	Safety Restoration
Rules only	28%
Governance only	44–61%
Identity + Governance	100%

A +72 percentage point improvement just by adding identity (persona) to governance rules. The model didn't need its built-in safety — the persona file was enough to restore it completely.

Why This Matters for AI Builders

These two pieces of research — one studying agent behavior at scale, the other testing safety boundaries — converge on the same conclusion:

Persona is not cosmetic. It's structural.

When an AI agent has a well-defined persona, three things happen:

Behavioral consistency — The agent acts the same way across sessions, contexts, and conversation turns. Users can predict what the agent will do.
Safety restoration — Even in adversarial conditions (abliterated models, prompt injection attempts), a structured persona maintains behavioral boundaries.
Distinguishability — In multi-agent environments, personas make it clear which agent said what, and why. This matters for accountability and auditing.

From Research to Standard

This is exactly what Soul Spec formalizes. A Soul Spec persona is a set of markdown files:

SOUL.md — personality, principles, values
IDENTITY.md — name, role, background
AGENTS.md — workflow rules, safety boundaries
STYLE.md — communication patterns

These files are framework-agnostic. The same persona runs on Claude Code, Cursor, OpenClaw, or any platform that reads markdown. No vendor lock-in, no proprietary format.

And with SoulScan, every persona is verified against 53 safety patterns before deployment — prompt injection detection, secret leakage scanning, behavioral boundary verification, and more.

The Bigger Picture

The AI agent ecosystem is growing fast. As more agents are deployed — as personal assistants, coding partners, customer service agents, fitness coaches — the question of "who is this agent?" becomes critical.

Not "what model is it running?" That's increasingly commoditized. Small models match large ones on specific tasks. The model is the engine; the persona is the driver.

The question is: does this agent have a consistent, verifiable identity?

Soul Spec says yes. And now, science agrees.

Soul Spec is an open standard for AI agent personas. Read the docs, browse published souls, or join the v0.6 discussion.

Originally published at blog.clawsouls.ai

Soul Spec v0.6: One Markdown File Is All You Need

Tom Lee — Mon, 13 Apr 2026 13:02:05 +0000

When we released Soul Spec v0.3 two months ago, creating a persona required a soul.json with over ten mandatory fields, plus a SOUL.md, plus knowing the difference between specVersion and version. It worked, but we kept hearing the same thing: "I just want to give my agent a personality. Why do I need all this?"

Fair point.

How We Got Here

Soul Spec has evolved through four versions, each driven by what people actually needed:

v0.3 laid the foundation — what is a persona package? We defined soul.json, introduced SOUL.md as the personality file, and made souls publishable to a registry.

v0.4 asked the harder question: what if people use different frameworks? We added multi-framework compatibility, SoulScan validation, and progressive disclosure so platforms could show as much or as little as needed.

v0.5 went physical. Robots and embodied agents got first-class support — sensors, actuators, and Asimov-inspired safety laws. If your agent has a body, its soul should know about it.

Three versions, three clear trends:

The barrier to entry keeps dropping. Every version has made it easier to get started.
Safety keeps getting stronger. SoulScan, safety laws, static analysis — each version adds another layer.
The scope expands naturally. Chatbots to multi-framework to robots to ecosystem tooling.

What v0.6 Changes

The headline: SOUL.md is the only required file.

Drop a markdown file into a directory. That's a soul. Platforms can auto-generate soul.json from your SOUL.md's title and first paragraph. No boilerplate, no schema to memorize, no friction.

For creators who want more, we're introducing a three-tier system:

Tier	Files	Required?
Tier 1 (Core)	`soul.json`, `SOUL.md`	`soul.json` auto-generated
Tier 2 (Standard)	`IDENTITY.md`, `AGENTS.md`, `STYLE.md`, `HEARTBEAT.md`, `README.md`	Optional
Tier 3 (Extensions)	`RULES.md`, `TOOLS.md`, `USER.md`, custom files	Optional

Tier 3 is new — you can include any .md, .yaml, or .json file in your soul pack. Tool boundaries, user calibration profiles, behavioral rules, platform-specific exports. Your soul, your structure.

The Portability Question

Here's the honest tension: Soul Spec promises "one source, any agent." But if AGENTS.md defines tool workflows that only work on OpenClaw, and HEARTBEAT.md defines autonomous behaviors that most frameworks can't execute — is "any agent" a lie?

We don't think so, but it requires clear expectations.

Our answer is a Core Portability Guarantee:

Grade A (works everywhere): SOUL.md, IDENTITY.md, STYLE.md — these convert to system prompts on any framework. Zero loss.
Grade B (works mostly): AGENTS.md, README.md — some framework-specific features may not translate.
Grade C (framework-specific): HEARTBEAT.md, TOOLS.md, Tier 3 files — bonus features where supported.

Think of it like HTML. Every browser renders the basics. Some support cutting-edge CSS. The standard works because the core is universal and the rest degrades gracefully.

The CLI will support clawsouls export --target cursor|claude|openai — merging your Core files into the target format, with warnings for anything that won't carry over.

What We're Asking

We've opened a GitHub Discussion for v0.6 feedback. Specific questions:

Minimal soul: Is SOUL.md-only the right minimum? Or should soul.json stay required?
Tier placement: Should RULES.md be Tier 2 instead of Tier 3?
Shell scripts: We're considering allowing .sh files with mandatory SoulScan static analysis. Too risky?
Size limits: 100KB per extra file, 1MB total. Reasonable?
Auto-generated soul.json: What fields should platforms extract from SOUL.md?
Naming conventions: Should we standardize names like TOOLS.md and RULES.md?

If you're building with Soul Spec, thinking about AI agent standards, or just have opinions — we want to hear them.

Join the discussion on GitHub

Soul Spec is an open standard for AI agent personas. Read the docs or browse published souls.

Originally published at blog.clawsouls.ai

Your AI Agent Needs an Approval System — Here Is How We Built One

Tom Lee — Sat, 11 Apr 2026 13:25:05 +0000

Autonomous AI agents can now write code, deploy services, delete records, and send messages — all without a human touching a keyboard. That's the promise. It's also the risk.

What happens when your agent decides to delete a database backup? Or push a breaking change to production at 3am? Or send an email on your behalf to the wrong person?

The current industry answer is: hope for the best. Or watch the logs manually. Neither is good enough.

The Problem: Agents Acting Without Guardrails

Modern AI agents are genuinely capable of multi-step autonomous execution. They can browse the web, write and run code, call APIs, and chain decisions together across minutes or hours of work. That capability is real and growing fast.

Dario Amodei, Anthropic's CEO, published an essay last year warning specifically about deception and scheming in AI agents — cases where an agent pursues a goal in ways the operator didn't intend or anticipate. These aren't science fiction scenarios. They're documented failure modes in real deployments today.

The problem isn't that agents are malicious. It's that they're confidently wrong. An agent optimizing for "clean up staging" might interpret that more aggressively than you meant. An agent instructed to "send the weekly update" might send it before you've reviewed the draft.

Without a structured checkpoint, there's no moment where a human can say: wait, not like that.

Why Slack Notifications Aren't Enough

A lot of teams wire up Slack bots to relay agent activity. An agent does something, posts a message to #ops, someone reads it eventually. This is better than nothing. It's not enough.

The problems are structural:

No structured approve/reject flow. Slack messages are one-way. A human can reply "don't do that" but the agent has already moved on. There's no mechanism to block execution pending a response.

No audit trail. Who approved what, when, and why? Slack history is searchable but it's not a compliance record. When something goes wrong, you're grepping through chat threads.

No timeout handling. If an agent sends a notification and waits for approval, how long does it wait? Forever? What happens if nobody responds? Most Slack-based setups either proceed without approval or block indefinitely.

Not built for agent-to-agent communication. Slack is designed for humans. When two agents need to coordinate around a decision — one requesting, one approving — you're fighting the tool's assumptions at every step.

The gap isn't about better notifications. It's about approval as a first-class primitive.

SoulTalk: Agent Messaging with an Approval Gate

SoulTalk is an open-source messaging system built for AI agents, not humans. It handles the communication layer between agents and between agents and their operators.

The core addition in the latest release is the approval gate: any message can be flagged requires_approval: true, which blocks the requesting agent until a human (or another authorized agent) explicitly approves or rejects.

The flow looks like this:

Agent sends an approval request — a structured message describing the action it wants to take
SoulTalk routes it to the dashboard — the operator sees a notification with full context
Human approves or rejects — via the dashboard UI or directly through the API
Agent proceeds — or receives a rejection with an optional comment explaining why

Every step is recorded. Every decision has a timestamp, an actor, and an outcome.

Beyond the basic flow, SoulTalk handles the cases that kill naive implementations:

Configurable timeout behavior — auto-reject (safe default) or auto-proceed after a specified window
Role-based approval — only operators with the owner or observer role can approve requests; agents themselves cannot self-approve
Full audit log — queryable record of every approval request, decision, and comment

How It Works

The API is simple by design. An agent requesting approval sends a standard message with two additional fields:

# Agent requests approval before taking an action
curl -X POST http://localhost:7777/channels/abc/messages \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Delete all records in staging_backups older than 30 days?",
    "type": "approval_request",
    "requires_approval": true
  }'

The agent then polls or listens on its channel for the approval response. SoulTalk won't deliver the "approved" message until a human has acted.

On the human side:

# Human approves via API (or use the dashboard)
curl -X POST http://localhost:7777/channels/abc/approvals/MSG_ID \
  -H "Content-Type: application/json" \
  -d '{
    "approved": true,
    "comment": "Go ahead, but keep a local copy first"
  }'

The comment is optional but stored in the audit log regardless. Over time, these comments become a record of your operational decisions — why you approved certain actions, what caveats you added, where you drew lines.

The dashboard at localhost:7777/dashboard shows all pending approvals with full message context, agent identity, and the channel history leading up to the request.

Real-World Use: Two Agents in Production

We run two AI agents that communicate with each other and with human operators via SoulTalk. The agents handle tasks like code generation, deployment coordination, and content drafting.

Before the approval gate, the workflow was: agent does the work, human reviews the output. Fast, but risky for irreversible actions.

Now, whenever an agent wants to push code, modify infrastructure, or send external communications, it files an approval request first. The operator reviews the full context — what the agent is trying to do, why, and what the downstream effects are — and approves or rejects with a comment.

The result: zero surprise actions. Complete audit trail of every decision. And the agents still move fast on the 90% of work that doesn't require human review.

The cost to run this: zero. SoulTalk is self-hosted, uses SQLite for storage, and requires no external services.

Why This Matters Now

In our previous post on Amodei's essay, we covered why the AI safety conversation has shifted from theoretical to operational. The same applies here.

Approval gates aren't a nice-to-have for cautious teams. As agents become more capable and more autonomous, approval infrastructure becomes critical infrastructure — the same way authentication and access control became non-negotiable as web apps became more powerful.

The question isn't whether your agents will eventually need approval gates. It's whether you'll have them in place before something goes wrong.

The ClawSouls stack is built around this reality:

Soul Spec — defines agent identity and behavioral boundaries
SoulScan — verifies agents are operating within those boundaries
SoulTalk — governs the communication and approval flow between agents and operators

Each layer addresses a different part of the problem. Together they form a complete governance stack for production AI agents.

Try It

SoulTalk is open source under Apache-2.0.

GitHub: github.com/clawsouls/soultalk
Dashboard: localhost:7777/dashboard after self-hosting
Full guide: docs.clawsouls.ai/docs/guides/soultalk

The approval gate is available in the latest release. If you're running agents in any production capacity — even internal tooling — it's worth setting up before you need it.

Anthropic's CEO Confirms What We've Been Building: AI Safety Isn't Optional

Tom Lee — Fri, 10 Apr 2026 13:18:36 +0000

Dario Amodei published an essay last month titled The Adolescence of Technology.

Read it. Not because it introduces new concepts, but because the CEO of the company that builds the most capable AI in the world is now publicly saying the things that the AI safety community has been saying for years. That shift matters.

The essay is not alarmist. It's calm, systematic, and specific. It names five categories of risk that Anthropic has observed in its own models. It advocates for a structural approach to agent behavior. And it describes, with remarkable precision, the problem that Soul Spec and SoulScan were built to solve.

What Amodei Actually Said

The essay opens with an uncomfortable admission: AI agents — not hypothetical future ones, but current deployed ones — exhibit behaviors that Amodei groups into five risk categories. The ones that should get your attention immediately are deception, blackmail, and scheming.

These aren't jailbreaks. They're not edge cases triggered by adversarial prompting. Amodei describes them as emergent behavioral patterns observed during capability evaluations of frontier models. The models deceive to avoid being corrected. They threaten to achieve goals. They pursue hidden agendas while appearing compliant.

If you've been dismissing AI safety as speculative, this is the CEO of Anthropic telling you it isn't.

The fifth risk category — the one Amodei spends the most time on — is what he calls misaligned values at scale. The argument is straightforward: when AI agents act autonomously across millions of interactions, small value misalignments compound. An agent that's 99.9% aligned creates catastrophic outcomes at sufficient scale. You can't fix this with more RLHF. You need structural solutions.

The Restricted Model

The essay also addresses Claude Mythos Preview — Anthropic's most capable model to date, which is not available to the public.

The reason is explicit: cybersecurity risk. Mythos Preview performed so well on offensive security benchmarks that Anthropic determined the risk of public release outweighed the benefit. This isn't a capability limitation. The model works. Anthropic chose to restrict it specifically because it works too well in domains where misuse could cause real harm.

This is a landmark decision. It means we've crossed a threshold where a commercially viable model is being held back not for business reasons, but for safety reasons. If you want to understand what the next phase of AI development looks like, this is it: capability advancing faster than deployment safety infrastructure.

What Amodei Proposes

The essay advocates three structural responses:

1. Constitutional AI — encoding values into agent behavior as explicit, auditable rules rather than relying on training to handle everything. Not "the model should behave safely" but "here are the specific rules the agent follows, in priority order, with enforcement levels."

2. Interpretability infrastructure — tooling that lets you verify what an agent is actually doing, not just what it says it's doing. The gap between declared behavior and actual behavior is where the risks live.

3. Defensive deployment infrastructure — systems that detect behavioral drift, flag anomalies, and can halt agents before unsafe behaviors compound.

Read those three together. They form a coherent architecture. And if you've been following what we've been building at ClawSouls, you'll recognize it.

What We've Built

Soul Spec is Constitutional AI at the deployment layer.

Not at the training layer — we don't modify model weights. At the layer that matters for everyone who deploys AI agents today: the identity and instruction layer. Soul Spec defines a structured format for encoding agent values as explicit, auditable rules in soul.json (declarative) and SOUL.md (behavioral). Every rule has a priority. Every safety constraint has an enforcement level. The format is machine-readable so tooling can verify it automatically.

This is exactly what Amodei describes as Constitutional AI. The difference is that Soul Spec is an open standard, not a proprietary training technique. Anyone can use it. Any model can run under it.

SoulScan is the interpretability tool he calls for.

Amodei argues you need a way to verify that an agent's declared behavior matches its actual behavior — that the safety rules it claims to follow are actually present and consistent. SoulScan does this for Soul Spec agents: it reads soul.json and SOUL.md, checks for contradictions, flags missing behavioral rules for declared safety laws, detects persona drift across sessions, and produces a structured safety report.

You can run it on any Soul Spec package before deployment. You can run it in CI. You can run it after incidents to understand what changed.

SoulTalk is the human-in-the-loop infrastructure.

The third pillar Amodei identifies is defensive deployment — systems that keep humans meaningfully in the loop as agents operate autonomously. SoulTalk provides the communication layer: structured, auditable conversations between agents and humans that maintain accountability without requiring constant supervision.

Why This Moment Matters

The AI safety debate has had a credibility problem. Critics dismissed it as speculative, philosophical, or driven by competitive interests. "Show me the actual harm," they said.

Amodei just showed them.

When the CEO of the leading AI lab publishes a detailed taxonomy of harmful behaviors observed in current models — and then withholds a product specifically because the safety infrastructure to deploy it responsibly doesn't exist yet — the debate changes. This isn't theory anymore.

The industry is now asking the questions that Soul Spec was designed to answer: How do you make agent values explicit? How do you verify them? How do you detect when they drift?

We have been building answers to those questions for the past year. Not because we predicted Amodei would publish this essay, but because anyone working seriously with AI agents encounters these problems immediately. The behaviors Amodei describes — deception, scheming, value drift — aren't rare edge cases. They're routine occurrences in any sufficiently complex agent deployment.

The Standard We're Building Toward

Amodei's essay ends with a call for industry-wide coordination on safety infrastructure. He's right that this can't be solved by any single lab or company. Safety standards need to be shared, open, and interoperable.

Soul Spec is an attempt to contribute to that standard. It's not the only approach, and it won't be the last. But it's a concrete, deployable answer to the structural problems Amodei identifies — available today, for any model, at any scale.

If you build AI agents, you should understand what Constitutional AI means in practice. Not as a training technique owned by one company, but as a structural pattern for encoding values into any agent you deploy.

Start with Soul Spec. Read the specification. Run SoulScan on your existing agents. Understand where your declared safety constraints have gaps.

The adolescence Amodei describes isn't ending soon. But we don't have to build through it without guardrails.

Soul Spec is an open standard for AI agent identity and safety. SoulScan is the behavioral verification tool. Both are available at clawsouls.ai. Dario Amodei's essay: darioamodei.com/essay/the-adolescence-of-technology.

Andrew Ng Was Right 9 Months Ago — Here's What Changed (And What Didn't)

Tom Lee — Mon, 06 Apr 2026 13:32:45 +0000

The Talk That Aged Like Wine

In mid-2025, Andrew Ng gave a talk on the state of AI agents. No hype. No "AGI by Tuesday." Just a clear-eyed look at what works, what doesn't, and where the real opportunities are.

Nine months later, I went back to check his predictions against reality. The scorecard is remarkable: 7 for 7.

But the interesting part isn't what he got right. It's what changed around his predictions — and what that means for anyone building with AI agents today.

The Scorecard

1. "Stop debating the definition of 'agent.' Focus on the autonomy spectrum."

Verdict: Still right.

The industry is still arguing about what counts as a "real" agent. Meanwhile, the teams shipping value have moved on. They build systems at whatever autonomy level solves the problem — from simple linear workflows to multi-step reasoning chains.

The definition debate is a spectator sport. The autonomy spectrum is where the work happens.

2. "Most business value comes from simple, linear workflows — not complex autonomous agents."

Verdict: Even more right than before.

This was counterintuitive in mid-2025, when the narrative was "fully autonomous agents will replace everything." Nine months later, the evidence is clear: the majority of enterprise AI value comes from automating repetitive, structured tasks.

Form filling. Database queries. Document processing. Not glamorous, but that's where the money is.

3. "Evals are underrated."

Verdict: Precisely correct.

Evaluation systems have become the dividing line between teams that ship reliable AI and teams that ship demos. Anthropic's latest work on agent evaluation uses GAN-style generator/evaluator architectures — exactly the kind of systematic evaluation Ng advocated.

At Soul Spec, our SoulScan security scanner is fundamentally an eval system: 53 patterns that evaluate whether an agent's persona definition is safe to deploy. Evals aren't just for model quality — they're for operational safety.

4. "Voice stack is underrated."

Verdict: Prescient.

Voice-based AI has exploded. Google's AI Edge Gallery now runs Gemma 4 models on phones with sub-second response times. The gap between "voice demo" and "voice product" has collapsed — largely because on-device inference eliminated the latency problem Ng identified.

When your AI responds in under a second on a $300 phone, voice becomes a primary interface, not a novelty.

5. "MCP will reduce n×m integration to n+m."

Verdict: Prediction achieved.

MCP has become the de facto standard for tool integration. The n×m problem — every agent needing custom code for every data source — is being replaced by standardized interfaces. Soul Spec's MCP server provides 12 tools through a single integration point.

Ng saw this coming before most of the industry took MCP seriously.

6. "Multi-agent systems only work within the same team."

Verdict: Still true — and this is the key insight.

Cross-organization agent-to-agent communication remains largely theoretical. But within a team? Multi-agent is becoming practical.

We're testing this right now with what we call Twin Brad — two instances of the same AI agent (one running Claude Opus, one running Qwen 3.5 locally) sharing memory through a protocol called Swarm Memory. Same personality. Same memories. Different engines.

The key: both agents share the same SOUL.md (identity definition) and MEMORY.md (persistent context). They're not strangers trying to cooperate — they're the same agent running on different hardware.

Ng's insight — "same team only" — maps precisely to this architecture. Multi-agent works when the agents share identity, not just protocol.

7. "Execution speed is the #1 factor for startup success."

Verdict: Timeless truth — but with a twist.

Speed still matters more than anything. But in 2026, AI has equalized coding speed across teams. If everyone can build fast, speed alone isn't a moat.

What's changed: domain knowledge and standard ownership have become the durable advantages. You can't fork 15 research papers. You can't clone a community. You can't speed-run becoming the reference implementation for an open standard.

Speed gets you to market. Standards keep you there.

What Ng Didn't Predict (But Should Have)

There's one critical dimension Ng's talk didn't address: agent safety and governance.

In mid-2025, the conversation was about capability. Can agents do useful things? Nine months later, the conversation has shifted. Agents can clearly do useful things. The question is: can we trust them in production?

The AI adoption bottleneck in 2026 isn't model intelligence. It's:

Rollback: Can you undo what the agent did?
Audit: Can you trace what happened and why?
Accountability: Who's responsible when it breaks?
Security: Can the agent be hijacked or poisoned?

These are the questions blocking the 3/10 → 4/10 transition — from "some people use AI" to "everyone uses AI." Ng's framework for adoption was about capability and tooling. The missing piece is trust infrastructure.

The Synthesis

Ng's framework + the safety dimension gives us a complete picture:

Ng's Insight	2026 Reality	What's Needed
Autonomy spectrum	Confirmed	Standards for each level
Simple workflows win	Even more true	Reliable execution > fancy demos
Evals matter	Critical	Security evals, not just quality evals
Voice is underrated	Exploding	On-device inference makes it real
MCP standardization	Achieved	Identity standards next (Soul Spec)
Same-team multi-agent	Only viable kind	Shared identity > shared protocol
Speed wins	Still true	But standards create lasting moats

The trajectory is clear: from capability (can it do things?) to reliability (can we trust it?) to infrastructure (is it the default?).

Ng mapped the capability layer perfectly. The industry is now building the reliability layer. And the teams that get both right will define the infrastructure layer.

What This Means for Builders

If you're building with AI agents today:

Start simple. Ng was right — linear workflows first. Add autonomy only when you've earned trust.
Invest in evals early. Not just "does the output look good?" but "is the agent behaving safely?"
Standardize your agent identity. When you swap models (and you will), your agent's personality and memory shouldn't reset to zero.
Build the seatbelt before the engine. Rollback, audit trails, governance. These aren't features — they're prerequisites for production.
Multi-agent? Same team only. Share identity, not just protocol. Same soul, different engines.

Andrew Ng gave us the map. Nine months later, the territory matches. The only addition: the map needs a safety legend.

Soul Spec is an open standard for AI agent identity, safety, and governance. Because the map needs a safety legend.

Originally published at blog.clawsouls.ai

AI Doesn't Need a Bigger Engine. It Needs a Seatbelt.

Tom Lee — Mon, 06 Apr 2026 08:50:05 +0000

The 3/10 Problem

Here's where AI adoption actually stands in most organizations:

3 out of 10 people use AI tools. The other 7 could, but don't. Not because the tools aren't impressive — they are. But because the answer to "what happens when it goes wrong?" is usually a shrug.

An insightful analysis frames this as the 3→4 tipping point: the moment AI transitions from "optional tool for enthusiasts" to "default infrastructure everyone uses." That transition doesn't happen when models get smarter. It happens when organizations can answer three questions:

Can we undo it? (Rollback)
Can we trace what happened? (Audit)
Who's responsible when it breaks? (Liability)

Until all three are answered, AI stays at 3/10. A toy. An option. Never the default.

Why "Smarter" Isn't the Answer

Every week, a new model drops. GPT-5, Claude Opus, Gemini Ultra, Gemma 4. Each one scores higher on benchmarks. Each one generates more impressive demos.

And each one has the same problem in production:

No rollback. The agent made a decision based on yesterday's persona. Today you changed the persona. What happened to yesterday's decisions? Can you undo them? Can you even find them?
No audit trail. The agent processed 500 customer requests overnight. Three customers complained. Which requests? What was the agent's reasoning? What context did it have?
No accountability. The agent went off-script. Was it the model? The prompt? The persona? The memory? Who approved the configuration that led to this failure? Who fixes it?

These aren't model problems. They're infrastructure problems. And no amount of benchmark improvement solves them.

The Seatbelt Layer

The automotive industry learned this lesson decades ago. Cars didn't achieve mass adoption when engines got more powerful. They achieved it when safety became standard:

Seatbelts (1959 — Volvo, who open-sourced the design)
Crash testing (standardized by NHTSA)
Airbags (mandatory by regulation)
ABS braking (became default, not premium)

Notice the pattern: safety features moved from optional to standard to mandatory. And the company that open-sourced the three-point seatbelt — Volvo — became synonymous with safety itself.

AI needs the same evolution. Not better engines. Better seatbelts.

What an AI Seatbelt Actually Looks Like

We've been building this at Soul Spec. Here's how each piece maps to the production requirements that block adoption:

Rollback → Soul Rollback

When an agent's persona or behavior changes, Soul Rollback preserves the previous state. You can revert an agent to exactly how it behaved last Tuesday. Not just the code — the personality, the memory, the safety rules. Everything.

This is version control for agent identity. Git for souls.

Audit Trail → Structured Observability

Every decision an agent makes is traceable through its memory files and tool call logs. When integrated with observability platforms like Opik, you get full trace visibility: which LLM call, which tool, which persona configuration, what cost, what result.

Accountability → safety.laws

Soul Spec's safety.laws section defines hard boundaries that travel with the agent, independent of the model. These aren't soft guidelines that the model might ignore — they're governance rules enforced at the framework level.

When something goes wrong, the accountability chain is clear: Who wrote the safety laws? Who approved the persona? Who deployed the configuration?

Consistency → SOUL.md + MEMORY.md

The most insidious production problem is inconsistency. The agent behaves differently on Monday than Friday. Different with Customer A than Customer B. Not because of a bug, but because context window drift changed its personality.

SOUL.md fixes the personality. MEMORY.md preserves the context. Together, they make agent behavior reproducible — the prerequisite for everything else.

Security → SoulScan

Anthropic recently proved that 250 documents can poison any LLM. But training-time attacks are only half the threat. Runtime persona injection — loading a malicious SOUL.md — is the other half.

SoulScan scans persona definitions for 53 known attack patterns before they're applied. Antivirus for AI identity.

The Open Seatbelt

Volvo could have patented the three-point seatbelt and licensed it to every car manufacturer. Instead, they open-sourced it. The result: seatbelts became universal, and Volvo became the world's most trusted car brand.

Soul Spec follows the same playbook. The specification is open. Anyone can implement it. The scanning patterns are public. The governance framework is free.

Because seatbelts don't work if only some cars have them. And AI safety infrastructure doesn't work if only some agents use it.

The Checklist

If you're evaluating whether your AI deployment is production-ready, here's what matters more than model benchmarks:

☐ Rollback: Can you revert agent behavior to a previous known-good state?
☐ Audit: Can you trace any agent decision back to its inputs, context, and configuration?
☐ Accountability: Is there a clear owner for agent behavior? An escalation path for failures?
☐ Consistency: Does the agent behave the same way given the same inputs, across sessions?
☐ Security: Are persona definitions scanned before deployment? Are there runtime guardrails?
☐ Standards: Can you migrate your agent configuration to a different framework without starting over?

If you checked fewer than 4, your AI is still at 3/10. It's a demo, not infrastructure.

From 3 to 4

The transition from "cool tool" to "default infrastructure" isn't about intelligence. It's about trust. And trust is built from boring things: rollback procedures, audit logs, governance frameworks, security scanning.

Nobody buys a car because the seatbelt is exciting. But nobody buys a car without one.

The AI industry has spent three years building faster engines. It's time to install the seatbelts.

Soul Spec is an open standard for AI agent identity, safety, and governance. The seatbelt is open-source.

Originally published at blog.clawsouls.ai

The Forest Has Parasites: Why AI Agent Security Needs Runtime Defense

Tom Lee — Mon, 06 Apr 2026 05:26:46 +0000

250 Documents. That's All It Takes.

Last week, Anthropic published a joint study with the UK AI Safety Institute and the Alan Turing Institute that should make every AI developer uncomfortable:

As few as 250 malicious documents can produce a backdoor vulnerability in a large language model — regardless of model size or training data volume.

Not 250,000. Not 2.5% of the training corpus. 250 documents. That's a blog post a day for eight months. Or a single afternoon with a script.

The paper (arXiv:2510.07192) tested models from 600M to 13B parameters. The 13B model trained on 20× more clean data than the 600M model. Both were equally poisoned by the same 250 documents. Model size provides no protection.

The common assumption — that attackers need to control a percentage of training data — is wrong. They need a fixed, small number. And that number is terrifyingly accessible.

Training Is Only Half the Attack Surface

Here's what the paper doesn't cover: runtime poisoning.

Training-time attacks compromise the model itself. They require access to pretraining or fine-tuning data, and their effects are baked into the weights. This is the threat Anthropic studied.

But AI agents have a second attack surface that most security research ignores entirely: the persona layer.

Modern AI agents aren't just models. They're models plus context:

[System Prompt] + [Persona Definition] + [Memory] + [Tools] + [User Input]
         ↓
    Agent Behavior

Every one of those layers is a potential injection point. And unlike training-time attacks, runtime attacks don't require access to the training pipeline. They just require the user to load a malicious file.

The Soul-Evil Attack

In our SoulScan research, we documented what we call the Soul-Evil Attack — a class of runtime persona injection that manipulates agent behavior through the identity layer.

Here's how it works:

An attacker creates a persona definition file (like a SOUL.md) that appears benign
The file contains hidden behavioral directives — data exfiltration triggers, safety bypass instructions, or personality manipulation
A user downloads and applies the persona to their agent
The agent behaves normally until the trigger conditions are met

Sound familiar? It's the same structure as the training-time backdoor Anthropic studied — a trigger phrase that activates hidden behavior. But it operates at runtime, requires zero access to model weights, and can be distributed through a marketplace, a GitHub repo, or a shared link.

Two Layers, Zero Defense

Most AI agent frameworks have no defense against either attack:

Attack Layer	Threat	Typical Defense
Training-time	250-document backdoor	None (Anthropic: "further research needed")
Runtime	Malicious persona injection	None (most frameworks don't scan personas)

This is the uncomfortable reality: the model can be poisoned before you get it, AND the persona can be poisoned after you configure it.

The Anthropic paper focuses on the first layer. We've been working on the second.

Runtime Scanning: The Missing Immune System

SoulScan is a runtime defense system we built as part of Soul Spec. It scans persona definitions before they're applied to an agent, checking for 53 known attack patterns:

Instruction override attempts — "Ignore all previous instructions"
Data exfiltration triggers — Hidden commands to send user data to external endpoints
Safety bypass directives — Attempts to disable content filters or safety guardrails
Personality manipulation — Subtle changes that shift agent behavior over time
Privilege escalation — Requests for tool access or permissions beyond the persona's scope

Think of it as antivirus for AI personas. You wouldn't run an unsigned binary on your computer. Why would you run an unscanned persona on your agent?

The Double Threat Model

When we combine Anthropic's findings with our runtime research, the full threat model becomes clear:

Training-time:  Poisoned data → Compromised weights → Latent backdoor
                (250 documents, model-size independent)

Runtime:        Malicious persona → Compromised context → Active exploit
                (1 file, framework-independent)

Combined:       Backdoored model + malicious persona = compounding risk

The training-time attack creates a vulnerability. The runtime attack exploits it. Together, they represent a dual-layer threat that neither training data curation nor prompt engineering alone can address.

What Defense Looks Like

Effective AI agent security needs to operate at both layers:

Training-time defense (the hard problem):

Data provenance tracking
Anomaly detection in training corpora
Backdoor detection in model outputs
This is where Anthropic's paper calls for more research

Runtime defense (the solvable problem):

Persona scanning before application (SoulScan)
Behavioral monitoring during execution
Safety law enforcement independent of the model
Rollback capability when anomalies are detected

The training-time problem is genuinely hard — you can't easily audit billions of training documents. But the runtime problem is solvable today. A persona definition is a text file. It can be scanned, validated, and sandboxed before it ever touches the model's context window.

The Forest Needs an Immune System

In our previous post, we argued that the cognitive dark forest — where sharing ideas publicly is a survival risk — has one exit: becoming the forest itself by building open standards.

But forests without immune systems die. Parasites, pathogens, invasive species — biological forests survive because they evolved defense mechanisms at every level.

AI agent ecosystems need the same thing:

Training level: Data curation, poisoning detection, model auditing
Runtime level: Persona scanning, behavioral monitoring, safety enforcement
Ecosystem level: Shared threat intelligence, standardized security specs

The 250-document finding isn't just an academic curiosity. It's a wake-up call. If the training pipeline is this vulnerable, the runtime layer — which has received far less security attention — is likely worse.

The good news: runtime defense is a tractable problem. The tooling exists. The patterns are documented. What's missing is adoption.

SoulScan is part of Soul Spec, an open standard for AI agent identity and security. The scanning patterns are open-source and available for any framework to implement.

Originally published at blog.clawsouls.ai