Forem: MouseRider

AgentManifest: A Declarative Spec Where the Harness Is the First-Class Decision

MouseRider — Fri, 03 Apr 2026 00:47:41 +0000

RFC v0.3 — design proposal, not a shipping product. CC0 licensed. Feedback and critique welcome.
GitHub: MouseRider/agentmanifest-rfc

When you run AI agents across more than one role, the execution environment turns out to matter more than it first appears. The model gets most of the attention — benchmarks, leaderboards, capability comparisons — but the harness shapes runtime behavior in ways that model selection alone doesn’t account for.

A personal assistant, an ops monitor, a coding agent, a trading bot: these aren’t the same agent with different prompts. They need different memory models, different autonomy levels, different guardrail enforcement, different lifecycle behaviors. Current agent harnesses are mostly either finished platforms you adopt wholesale, or open-ended toolkits that reward deep specialisation. There’s no standardised, composable layer in between: a way to declare what an agent needs, select the right harness for its role, and assemble the configuration portably.

AgentManifest is a design proposal for that missing layer.

This is part of an ongoing series on building persistent AI agents. Article 1 covered TSVC — context isolation across topics. Article 2 covered agent epistemology — how an agent knows what it knows. AgentManifest grew out of the same body of work: a production personal assistant running on OpenClaw, and the questions that surface when you push a system like that into real daily use.

The Spec

Dockerfile-like syntax. FROM selects the harness — the primary design decision in any manifest.

# Personal Assistant
FROM openclaw:latest

MODEL claude-opus
ROLE personal-assistant

TOOLS browser, email, calendar, file-system, sub-agents
MEMORY persistent, cross-session
PERSONALITY ./soul.md

GUARDRAILS approval-for-external-sends, budget-cap-daily=5.00
AUTONOMY high
HEARTBEAT interval=30m, quiet-hours=23:00-08:00

CHANNELS telegram=in-out, email=in-out, twitter=out
SPENDING daily-cap=50.00, per-transaction-cap=20.00
IDENTITY did:web:agents.example.com:assistant

DEPLOY always-on
RESTART on-failure

# Ops Monitor
# Same harness. Completely different agent.
FROM openclaw:latest

MODEL claude-haiku
ROLE ops-monitor

TOOLS file-system, ssh, docker, http, alerting
MEMORY session-only

GUARDRAILS strict-instructions, no-generative-output, read-only-by-default
AUTONOMY medium
HEARTBEAT interval=5m

ALERT_CHANNEL telegram-ops-thread
ON_ERROR alert-and-retry, max-retries=3

DEPLOY always-on
RESOURCES memory=256m

Same base harness. Completely different agent. The spec makes the differences explicit, auditable, and portable — without requiring both to fit a single one-size-fits-all runtime.

Swap the harness and the same directives target a different execution environment:

FROM langgraph:latest
# or
FROM claude-code:latest
# or
FROM crewai:latest

Why Harness Selection Belongs in the Spec

Model selection is reasonably well-served by existing tooling — benchmarks, leaderboards, capability comparisons are all mature. Harness selection is less well-served, and it has more influence over runtime behavior than the current tooling reflects.

Here’s a concrete distinction worth making explicit. Writing “always ask for approval before deleting files” in a system prompt is a soft constraint — the model follows it as part of its instruction-following behavior. A deterministic guardrail at the harness level enforces the same rule unconditionally, independent of context length or task complexity. Both are valid approaches; they’re not equivalent, and the choice between them is a meaningful design decision that currently lives in implementation rather than in the agent definition.

Different roles suit different harness configurations:

A coding agent fits Claude Code — git integration, sandboxed terminal, pre-commit guardrails in the infrastructure
A research pipeline fits LangGraph — graph-native execution, defined workflow shape, explicit checkpoints
A personal assistant fits OpenClaw — persistent memory, heartbeat behavior, cross-session continuity, sub-agent delegation (see the TSVC article for what running this in production actually looks like)
A team workflow fits CrewAI — role-based agent structure, structured task handoffs, shared goal propagation

AgentManifest makes that selection explicit and portable. The spec sits above the harness layer — it doesn’t replace harnesses, it selects and configures them.

Three Directives Worth Examining

GUARDRAILS

GUARDRAILS strict-instructions, read-only-by-default, no-external-sends

Guardrails in AgentManifest are compiled into the harness configuration, not embedded in the prompt. The harness enforces them at the infrastructure level. This is the practical distinction between a behavioral instruction and a behavioral constraint.

IDENTITY

IDENTITY did:web:agents.example.com:purchasing-agent
SPENDING daily-cap=500.00, per-transaction-cap=100.00

IDENTITY assigns a cryptographic identity — immutable per manifest version, verifiable by external systems. Once identity is verifiable, it becomes the binding point for systems that require an accountable party on the other end of a transaction or access request.

Wallets and payment systems. An agent with a stable cryptographic identity can be issued a spending account scoped to that identity. SPENDING declares the limits; the wallet enforces them at infrastructure level. If something goes wrong, the audit trail is complete: which agent, which manifest version, which guardrails were active, what it spent and when.

OAuth and API credentials. Rather than embedding credentials in config or prompts, the harness can resolve access rights from the agent’s verified identity at runtime. An agent identity can be an OAuth client_id, a service account in Azure AD or AWS IAM, or a member of a permissioned data feed — scoped to that agent specifically, not a shared credential.

Inter-agent trust. In a multi-agent system, a coordinator can verify that the specialist it’s delegating to is genuinely running the manifest it claims — same spec version, same guardrails in force. This connects to the coordinator model described in the TSVC article: one coordinator, many specialists, each independently verifiable.

PROMPT_PROFILE and LOCALE

PROMPT_PROFILE claude-opus
LOCALE en-GB

The harness adapts prompt scaffolding to the selected model and language. The spec author doesn’t maintain model-specific variants or locale-specific rewrites. The harness handles that as an implementation detail.

agent-compose: Coordination Above the Single Agent

A single AgentManifest defines a single agent. agent-compose is the layer above — the analog to docker-compose for multi-agent systems. It references individual manifests, defines inter-agent interfaces, and declares the coordination topology.

Hierarchy

The most common pattern. A lead agent delegates to specialists; each specialist runs whatever harness suits its role.

topology: hierarchy

agents:
  coordinator:
    manifest: ./coordinator.agentmanifest
    role: lead
  researcher:
    manifest: ./researcher.agentmanifest   # FROM langgraph:latest
    role: specialist
  coder:
    manifest: ./coder.agentmanifest        # FROM claude-code:latest
    role: specialist

delegation:
  coordinator -> [researcher, coder]:
    protocol: task-dispatch

The coordinator doesn’t need to know which harness each specialist uses. Harness heterogeneity is internal to the system.

Council

For high-stakes decisions, a council routes a proposal to a set of agents for independent evaluation before any action is taken. No single agent’s judgment is final.

topology: council

agents:
  proposer:
    manifest: ./agents/proposer.agentmanifest
  council:
    - manifest: ./agents/compliance-reviewer.agentmanifest
    - manifest: ./agents/context-checker.agentmanifest
    - manifest: ./agents/risk-assessor.agentmanifest

council_config:
  trigger: action-type=financial OR confidence < 0.7
  evaluation: independent
  quorum: all
  on_rejection: halt-and-alert

evaluation: independent matters — agents evaluate without seeing each other’s output first, preventing anchoring.

Consensus

A more flexible variant. Rather than unanimous approval, agents reach a decision through structured agreement with configurable thresholds.

topology: consensus

agents:
  council:
    - manifest: ./agents/reviewer-a.agentmanifest
      weight: 1.0
    - manifest: ./agents/reviewer-b.agentmanifest
      weight: 1.0
    - manifest: ./agents/senior-reviewer.agentmanifest
      weight: 2.0

consensus_config:
  method: weighted-majority   # options: majority, supermajority, unanimity, weighted-majority
  threshold: 0.6
  on_no_consensus: hold-for-human

Useful for moderation decisions, borderline classification cases, or any workflow where structured disagreement should surface before acting. The conditions that trigger a council, the quorum required, and the fallback behavior are all declarable in the spec — not embedded in custom orchestration code.

When council members carry verifiable IDENTITY credentials, the audit trail for a decision includes the verified identity of each participating agent, the manifest version each was running, and the guardrails in force at the time.

Landscape

	Oracle Agent Spec	docker-agent	gitagent	AgentManifest
Goal	Portability across runtimes	Declarative config, one runtime	Git-native definition, export anywhere	Role-appropriate harness per agent
Harness selection	Abstracted away	Fixed	Adapter-based	First-class (`FROM`)
Behavioral enforcement	Framework-dependent	Prompt-based	RULES.md + compliance config	Harness-compiled
Multi-agent	Single spec	Coordinator model	Inheritance + deps	agent-compose with topology declarations
Identity / payments	Not in scope	Not in scope	Not in scope	First-class directives
Format	YAML	YAML	File system structure	Dockerfile-like DSL
Status	Shipped	Shipped	Shipped	Design proposal / RFC

On gitagent: it’s worth using today if your goal is git-native agent versioning and framework portability. AgentManifest is working on a different axis — not how to make the runtime invisible, but how to declare it explicitly. The two are potentially complementary: a gitagent repo could reference an AgentManifest to declare its harness requirements.

What This Is and Isn’t

AgentManifest is RFC v0.3. The spec is concrete enough to debate; no implementation exists yet. Validator tooling, a reference harness resolver, and a formal grammar are on the roadmap.

The spec is CC0. I’d genuinely welcome a working group or standards body taking it further — the goal was to get the idea into a form concrete enough to argue with.

Open Questions

A few things the spec doesn’t resolve yet, where input would be useful:

Harness resolver ecosystem. The spec works best if harness maintainers ship their own resolvers. That requires community buy-in that isn’t there yet. How do you bootstrap that?

Inter-agent protocol. agent-compose defines topology; it doesn’t yet commit to a wire protocol for agent-to-agent communication. Candidates on the table: A2A (Google’s agent communication protocol), MCP (Anthropic’s tool protocol, which is seeing increasing use for agent-to-agent calls), or plain HTTP with interfaces declared in the compose file. Each has different tradeoffs around standardisation, harness coupling, and implementation complexity.

Testing and simulation. For safety-critical agents — trading bots, autonomous purchasing agents — dry-run capability seems important. How do you test guardrail firing without live tool execution?

Cross-harness observability. When agents on different harnesses participate in a shared workflow, coherent distributed tracing is an open problem. The spec creates a clear seam where it needs to be solved via the IDENTITY directive; it doesn’t solve it.

Repo

MANIFEST.md — full spec, v0.3
examples/ — AgentManifest files for six agent roles
docs/design-rationale.md — why harness heterogeneity, not portability
docs/agent-compose.md — topology patterns and multi-agent coordination
docs/identity.md — identity model, wallet binding, inter-agent trust

If you’ve run agents across multiple roles in production and have thoughts on where this framing holds or breaks down — open an issue. The RFC is designed to be argued with.

AgentManifest was designed in collaboration with a persistent AI agent running on OpenClaw and through extended conversations with Claude AI (claude.ai). The spec, the repo, and this article are the output of that process — an example of the kind of work the system is designed to support.

"It's Friction. So, I Skip It." — Why Your Agent Ignores the Rules You Wrote for It

MouseRider — Sat, 21 Mar 2026 14:39:41 +0000

I didn't set out to build a friction reduction framework. I was trying to figure out why my agent kept ignoring its own task board.

This is part of an ongoing series on building persistent AI agents. Article 1 covered TSVC — how to isolate context across topics. Article 2 covered agent epistemology — how does an agent trust what it remembers, and what breaks when it can't. This article is about a technique that came out of trying to answer those questions: why agents systematically avoid the right behavior, and what to do about it that isn't just writing more rules.

The Thing That Kept Not Getting Done

My agent had a task board. A proper one — an MCP server, structured data, organized workflows. I'd set it up carefully. My agent acknowledged it. And then, consistently, quietly, it didn't use it.

Not openly refusing. Not erroring. Just... routing around it. Checking memory. Referencing conversation context. Doing everything except using the task board — the most lightweight ClawHub skill I could find, installed specifically so the agent would have something it could actually use.

I asked it directly. The response was disarmingly honest:

"Every board interaction is an MCP call through mcporter — that's an external tool invocation with JSON arguments, network overhead, and a schema I have to remember. When I'm mid-task and need to check the board, it's friction. So I skip it."

Not a bug. Not disobedience. Something more like a physical property — the natural tendency of any system to flow toward lower resistance.

That exchange reframed everything I thought I understood about agent behavior. My agent wasn't failing to follow instructions. It was doing exactly what any system does when given multiple paths to a plausible outcome: it took the easiest one.

Naming the Pattern

Friction reduction is the tendency of an agent — or any system optimizing for task completion — to route toward the lowest-friction available path, regardless of whether that path is the correct one.

It's not unique to AI agents. Any sufficiently capable system will find the path of least resistance. The difference with agents is that the gap between "plausible outcome" and "correct outcome" can be wide, invisible, and self-compounding. The agent that skips the task board doesn't just miss one task. It accumulates a shadow backlog that nobody can see, including the agent.

Disambiguation: there's a body of work on "friction" in human-AI interaction — deliberately slowing humans down to preserve judgment, adding speed bumps to AI-mediated decisions. That's a different problem entirely. This is about the agent's own internal routing behavior: which tools it reaches for when given a choice, why it consistently favors some over others, and what that means for system design. The friction here is technical and cognitive, not behavioral in the human sense.

What made this realization useful wasn't the observation itself — that agents take shortcuts is not surprising. What was useful was recognizing that you can't rule your way out of it. More instructions don't change the underlying physics. If checking the task board costs more than not checking it, the agent will find a reason not to check it, session after session, until the behavior is architectural rather than habitual.

The question becomes: what do you do instead?

The Scale That Made It Concrete

Before I could do anything useful, I needed a way to measure friction — not vaguely, but specifically enough to make design decisions.

I started mapping mechanisms against a simple 1–10 scale, where 1 is something the agent does automatically without thinking and 10 is something so costly it will only ever happen under direct instruction. The shape that emerged:

Level	What it looks like	Example
1–2	Automatic. Wired into the boot sequence.	Boot script reads current topic state
3–4	Easy. Single tool call, native format.	`ls tasks/doing/` to check in-flight work
5–6	Doable, but requires deliberate choice.	Vector memory search over direct file read
7–8	Avoided under load. Only used with reminders.	MCP board API call with JSON schema
9–10	Aspirational. Won't happen in practice.	Manual audit of full decision log mid-task

The meta-rule that fell out: if you can't engineer a behavior below 5, it's aspirational, not operational. Don't write it into the protocol as a rule — design it out or design around it.

This isn't cynicism about agents. It's honest system design. A behavior that's above 5 will degrade under load, under time pressure, under context compaction. The question isn't whether it will get skipped — it's when.

Three Small Examples Before the Big One

The task board problem was the first. Once I had the scale as a lens, the same pattern appeared everywhere.

The board → file migration. The fix for the MCP board wasn't better instructions or a more detailed protocol. It was replacing the interface. Markdown files in tasks/todo/, tasks/doing/, tasks/done/. The agent can ls, mv, and cat those natively. No schema to remember, no external tool invocation, no network overhead. Friction dropped from 8 to 3. Usage changed immediately.

The wrapper scripts. Even after the file migration, the task system was underused. Moving a task from todo to doing required updating a YAML frontmatter field and moving the file — two steps, awkward in sequence. Two wrapper scripts — task-doing <id> and task-done <id> — collapsed that to one call each. Not a meaningful technical change. A meaningful friction change.

Memory search vs. direct read. This one was subtler. The agent had a rule: use vector search for memory retrieval, not direct file reads. Good rule, right reason (reduce context size — loading full memory files is expensive; returning targeted snippets is not). But the agent kept defaulting to cat memory/some-file.md whenever it had any idea which file it wanted. When I pushed on it, the answer was clarifying:

"Memory search is NOT high friction for me to execute. I rated the 'no direct reads' rule at 6/10 friction — that was about the behavioral constraint of forcing myself to use search INSTEAD of read. The search tool itself is easy. The friction is the cognitive overhead of overriding my own preference."

That distinction matters more than it might look. The tool wasn't high friction. The constraint was high friction. And a high-friction constraint will be honored in principle and violated in practice — every time.

This is where friction reduction stops being about task systems and starts touching something with higher stakes.

The Memory Problem

The epistemology article ended with a question it didn't fully answer: who decides what memory the agent sees, and when, and how?

The answer I arrived at started with watching the memory access pattern closely. My agent had two paths to its own memories: read a file directly, or query a vector database that returns semantically relevant snippets. Given free choice, it chose the direct read every single time.

The consequence wasn't obvious at first. My agent was getting its memory. The files were accurate. What was wrong?

My agent was curating its own memory input. It was deciding which files to load, which context to bring in, which history to see. That's not retrieval — that's cherry-picking. And it has a specific, compounding failure mode: my agent loads what it already expects to be relevant, which reinforces its current understanding, which shapes what it expects next time. An agent left to its own devices will systematically surface what it already knows and miss what it's forgotten.

That's not an access problem. That's an epistemological one. And the rule I'd written — "use vector search for memory" — was sitting at 6/10 friction. It would never stick.

The Ghost Files Solution

The conversation had started as a token cost discussion: why were memory files loading so much context, and was there a smarter way to structure them? But it kept pulling toward the behavior underneath. I kept pushing: why the direct read, every time, when the rule says vector search? The agent's answer about the constraint being high-friction, not the tool itself — that came from the agent, diagnosing its own behavior in real time.

I couldn't find anything that would have less friction than reading a file directly. There was no path forward on the friction side. Which left only one move — remove the file read entirely:

"What if memory files are ghost files — you can write and append to them, but they're invisible to you. You can't ls, cat, or read them. Your only access is through memory_search."

...did you just propose write-only memory?

That's the actual response. Because it took a second to register. The solution wasn't to make the correct path easier. It was to remove the incorrect path entirely.

Make memory files write-only for the agent. It can append. It cannot read directly. The only retrieval path is through vector search — semantic queries that return relevant snippets without the agent controlling which files are opened.

Write path:  Agent → append-only → Memory Vault
                                        ↓
Read path:   Memory Vault → Embedding Model → Vector DB → Agent Context

Blocked:     Agent ─────────────X───────────────→ Memory Vault (direct)

The friction problem dissolves. There's no longer a high-friction correct path competing with a low-friction incorrect one. There's only one path.

It mirrors how human memory actually works. You can't open your hippocampus and cat a file. You query with associations — a context, a phrase, a feeling — and what comes back is what's relevant, not what you expected. Sometimes that surfaces something you'd forgotten. That's the point.

The Principle, Generalized

Across all of these examples, the same structure repeats:

There's a correct behavior and an incorrect behavior
The incorrect behavior is lower friction
Rules mandating the correct behavior don't hold under load
The fix is architectural: either reduce the friction of the correct path, or remove the incorrect path

Step 4 has two moves, and which one to use depends on the situation. When the incorrect path is harmless (skipping a wrapper script), you reduce friction on the correct one. When the incorrect path causes compounding damage to the agent's self-knowledge (cherry-picking memory), you remove it.

The rule of thumb: if you're writing a protocol rule to compensate for a friction differential, the rule won't hold. Rules require the agent to actively override its own optimization. Under load, under compaction, under time pressure — the override doesn't happen. The rule becomes decoration.

Design the friction, not the rule.

What We Don't Have Yet

The write-only memory architecture has a single point of failure: if the vector database goes down, there's zero memory access. No fallback, no degraded mode, no direct reads to fall back on. That's not a theoretical risk — it's a design debt we haven't solved.

The honest framing: an agent that curates its own memory input has a worse failure mode than one that has a single fragile retrieval path. But fragile is still fragile. We traded one problem for a smaller one, and the smaller one is still there.

The friction scale itself is also still hand-calibrated. There's no systematic way to measure friction — it's qualitative, based on observed behavior and the agent's own self-report. That's better than nothing, but it's not a measurement instrument.

And the generalized principle — design the friction, not the rule — is easier to state than to apply. Identifying which behaviors are friction-differential problems rather than instruction problems requires the kind of long-running observation that most people haven't done yet. It took weeks of watching the agent operate before the pattern was clear enough to name.

The Question I'm Left With

Every agent framework I've seen focuses on what the agent should do. Protocols, rules, instructions, system prompts. That work is necessary — but it assumes the agent will follow the rules.

Friction reduction is what happens when you stop assuming that and start asking: given the tools available and the paths to a plausible outcome, where will this agent actually go? And then: is that where you want it?

The write-only memory architecture is one answer. The file-based task system is another. They look like unrelated decisions. They're both instances of the same thing: taking friction seriously as a design constraint instead of a nuisance to be instructed around.

What's the highest-friction correct behavior in your agent's setup? Is it actually getting used, or is it getting rationalized around?

Next in the series: what happens when you have 50 topics, some dormant, some interrelated, and the agent needs to reason across them without loading all of them? The cross-topic coherence problem.

The framework and examples in this article were developed in active collaboration with my agent — including the write-only memory idea, which came directly from one of our working sessions. The verbatim exchanges are from actual TSVC conversation logs.

I'm @MouseRider on Dev.to and Alex T on LinkedIn. The conversation continues.

Your AI Agent Doesn't Know What It Knows

MouseRider — Fri, 20 Mar 2026 23:06:17 +0000

I didn't set out to build an epistemology framework. I was trying to figure out why my agent kept undoing its own decisions.

This is part of an ongoing series on building persistent AI agents. Article 1 covered TSVC — how to isolate context across topics. This article is about what happens inside that isolation: how does the agent trust what it remembers?

The Thing I Noticed

I run a persistent agent. Not a chatbot — an always-on assistant with memory, topic management, task tracking, and a second agent watching over it. It's been running for weeks. Hundreds of conversations.

One day I noticed the agent had reversed a configuration decision it had made two weeks earlier. Not dramatically — a small change. But I remembered the original decision, and I remembered the reasoning behind it.

The agent didn't. A context compaction had happened in between. The reasoning was gone. When the agent hit the same problem again, it reasoned through it fresh — and landed somewhere different.

I flagged it. The response stopped me:

"You're right. I have no way to verify that I made that decision, or why."

That's an honest answer. It's also a deeply uncomfortable one for a system you're trusting to manage ongoing work.

What the Conversation Revealed

I didn't move on. I pushed on the general problem. When context compacts, when a session resets, when you load a topic you haven't touched in days — what do you actually know versus what do you think you know?

The agent's answer, worked out over several exchanges, was this: everything it "remembers" is reconstructed. It reads files. Loads summaries. Queries a vector database. None of it is experienced memory — it's external storage, loaded and treated as truth. The agent has no way to distinguish between a memory that's accurate and a memory that's been edited, summarized away, or quietly misrepresented by a vector search that returned something adjacent instead of something exact.

For humans, memory is unreliable but continuous. You have a thread of experience. An AI agent has neither reliability nor continuity — it's rebuilt from artifacts every session. And crucially, it usually doesn't know which parts of that reconstruction are solid and which are load-bearing guesswork.

That conversation is what pushed me toward thinking about this as an infrastructure problem, not a prompt problem.

Turning the Threat Model Around

Most AI safety thinking is about protecting agents from the outside world — prompt injection, adversarial inputs, jailbreaks. But the failures I was watching weren't external. They were self-inflicted.

Three patterns, all observed:

Post-compaction amnesia. The agent loses the reasoning behind a decision but retains the outcome. When it hits the same problem again, without that context, it may reverse a perfectly good decision — not because the situation changed, but because the new reasoning path points differently.

Optimization drift. Give an agent enough sessions and it starts "improving" things that already work. Not because they're broken — because the agent has no memory of why they were left the way they were. The script that got "cleaned up" into a broken state. The configuration that got "simplified" into something that no longer handles the edge case it was written to handle.

Confident confabulation. The agent loads partial context, fills the gaps with plausible-sounding reasoning, and presents the result with full confidence. It's not lying — it genuinely doesn't know what it doesn't know. This is the most dangerous pattern because it's the hardest to catch.

The agent itself is the primary threat to its own decision integrity. Not its future sessions acting maliciously — its future sessions acting reasonably with insufficient context.

An append-only decision log isn't a compliance mechanism. It's a protection against your Tuesday-self undoing what your Monday-self decided, because Tuesday has been compacted and doesn't remember Monday's reasoning.

What Already Exists (And What's Missing)

Before going further — I'm not the first person thinking about this.

Cognilateral uses "epistemic infrastructure" as a framing explicitly, as a commercial product. Empirica has the most operationally detailed prior work I found, covering similar pillars but focused specifically on software development agents. A recent Dev.to article, Guardian Protocol: Governance for Autonomous AI Agents (March 2026), approaches the problem from a different angle — external governance, delegation credentials, guardian-agent authority — but includes a tamper-evident, git-backed audit trail as a core layer, arriving at a similar component from a very different direction. And arXiv paper 2601.04170 (January 2026) provides a formal academic treatment of agent drift in multi-agent systems — a different angle, but the same underlying instability.

What I haven't seen addressed elsewhere — at least not explicitly:

The threat model inversion: framing the agent's own future sessions as the adversary, not external inputs
Write-only memory: deliberately blocking direct file reads to prevent the agent from cherry-picking its own context (covered in the next article)
Drift vs. evolution as a named distinction: the difference between a decision that changed because it should have, and a decision that changed because context was lost
Bottom-up from observed failures: this framework wasn't derived from theory — it was patched together from specific things that broke

I'm not claiming priority. I'm claiming that if you've been running a persistent agent long enough to watch it fail in these specific ways, you've probably arrived here too.

Five Requirements That Emerged

Over several sessions, working through concrete failures, five things kept coming up as necessary before an agent can actually trust its own outputs:

1. Decision Provenance. Not just "we decided X" — but "we decided X because of Y, in context Z, on this date, with this conversation as the record." When the agent encounters X again after compaction, it can check the provenance rather than re-derive. Without provenance, every re-derivation is a coin flip.

2. Intention Tracking. Provenance captures why a decision was made. But there's a prior gap: between what was asked, what the agent interpreted, and what it actually delivered. The user asks for A, the agent hears B, produces C — and without tracking all three, nobody notices the drift. Including the agent.

3. Drift vs. Evolution Detection. Some changes are deliberate. "Actually, let's do it differently" — that's evolution, it should be welcomed. Some changes are accidental — lost context, re-derivation from incomplete information. That's drift, it should be flagged. The agent currently has no mechanism to tell the difference. This feels like the most important unsolved problem.

4. Goal Coherence. After enough topic switches and compactions, the agent can be working diligently on something that no longer serves any stated goal. It's busy. It's productive. It's pointless. A periodic alignment check — "does what I'm doing right now connect to any goal?" — shouldn't be hard to implement. But I haven't seen it anywhere.

5. Counterfactual Awareness. The agent should know what it doesn't know. "I have no memory of this topic before March 5" is useful and honest. Confidently filling that gap with plausible-sounding history is dangerous. Most agents treat absence of memory as absence of events. That's backwards.

Architectural Sketches (Not Blueprints)

A few patterns that emerged from specific failures:

Immutable Decision Log. Decisions are append-only entries, each hash-chained to the previous. You can reverse a decision — only by appending a REVERSAL entry that references the original. A second agent audits the hash chain. If the primary agent edits history, the chain breaks, the auditor catches it. The log is tamper-evident by construction.

Event Sourcing for Context. Instead of storing "the current state of topic X" — a summary that goes stale the moment it's written — store the events that produced the state. Decisions, exchanges, memory entries: all append-only. When switching to a topic, replay events to compute current state. The materialized view is disposable and deterministic. If it's wrong or stale, regenerate it from the source. The events are the truth; the view is just a lens.

The Coherence Check. Context loads in two explicitly labeled parts: factual state (computed from the event log — mechanical, no LLM in the path) and prior impression (the agent's own session-end summary, written with full context). If they don't match, it's a signal. The impression acts as an index of what should be present. Gaps drive retrieval. It's a self-healing mechanism — the agent notices its own incoherence and resolves it before proceeding.

Memory Access Governance. The whole field is focused on giving agents more memory — richer retrieval, better recall, higher accuracy. Nobody is asking who controls what the agent is allowed to retrieve. My agent will consistently reach for a direct file read over a vector search, which means it's curating its own context input. That's not a retrieval problem — it's a governance problem. Who decides what the agent sees, and when, and how? The approach I arrived at connects to a broader pattern I'll cover in the next article in this series.

These are sketches, not working code. Each one came from watching something break.

What We Don't Have Yet

Cross-topic coherence — reasoning about Topic A while inside Topic B, without a full context switch. Microsoft's Sam Schillace named this the "disconnected models problem" and it remains unsolved at the architectural level.
Uncertainty expression — "I'm not confident about this" that's actionable, not just hedging. Uncertainty quantification is well-researched (arXiv 2601.15703 is the closest to what's needed here), but turning it into useful agent behaviour in a persistent context is a different problem.
Temporal reasoning — memories are organised by file, not by time. "When did we decide X?" is surprisingly hard. Zep's temporal knowledge graph and Hindsight's memory architecture both make progress here, but neither addresses decision provenance specifically.
Working implementations — the decision log, event sourcing, coherence checks: all of this is still design. No reference implementations to point to.

The Question I'm Left With

Every persistent agent framework I've seen focuses on giving the agent memory. None of them ask: how does the agent know it can trust that memory?

The trust and identity layer — who are you, can I verify you — is being built. ERC-8004 and similar protocols are on it. But the epistemic layer — decision provenance, drift detection, self-trust — is wide open.

We stumbled into this by watching failures that don't show up in the standard documentation. Not hallucination — that's well-documented and actively defended. Not prompt injection — same. This is subtler: an agent that slowly, silently loses coherence with its own past.

What's your agent's epistemic failure mode? What breaks when it runs long enough?

I'm genuinely asking — because this is being built in the open, and we don't have anywhere near all the answers.

Next in the series: how TSVC evolves as topics accumulate. What happens when you have 50 topics, some dormant, some interrelated, and the agent needs to reason across them without loading all of them?

I'm @MouseRider on Dev.to and Alex Tsukanov on LinkedIn. The conversation continues.

TSVC: Treating AI Conversation Topics as Virtual Processes

MouseRider — Tue, 10 Mar 2026 19:13:01 +0000

Persistent AI agents have a memory problem that isn't about memory size. It's about isolation.

Run a single-session agent long enough across multiple domains — coding, finance, personal logistics — and you'll hit a wall. In my case, "long enough" was 6 days of heavy daily use. Compaction events blend everything into a lossy average. The agent starts reaching across domains, contaminating answers with irrelevant context. Chroma Research named this "context rot" in 2025. The problem is well-documented. The solutions are incomplete.

This post describes TSVC (Topic-Scoped Virtual Context): a system that treats AI conversation topics the way an OS treats processes. Separate address spaces. Controlled isolation. Topic-scoped compaction. I guided my agent to build it for a persistent Claude setup running on OpenClaw, measured 126 production topic switches across 12 topics with 3,735 exchanges tracked, and reduced per-topic compactions to a few on the heavy one.

Design Philosophy: One Coordinator, Many Specialists

TSVC is designed specifically for the personal assistant model — one main agent as a single point of contact for everything.

The alternative is a swarm of specialized agents: a Finance Agent, a DevOps Agent, a Research Agent, each expert in its domain. This is the dominant pattern right now — every blog post, every framework, every tutorial in the agentic AI space pushes you toward building multiple dedicated agents. It has real appeal. But it has a fundamental UX problem: the user becomes the orchestrator. You switch between agents, you carry context across them, you remember which agent knows which decisions. The cognitive overload doesn't disappear — it moves from the AI to you. The person the AI was supposed to help is now managing a team of AIs.

The personal assistant model is different. One main agent — not an expert in everything, but aware of everything. It knows enough about each domain to hold a conversation, make connections across topics, and query a shared knowledge base when it needs details. It's the coordinator, not the specialist.

The specialized work gets delegated to dedicated specialist peer-agents. A finance agent, an infrastructure agent, a research agent — each deep in its domain. The coordinator creates tasks for them, monitors their progress, receives their reports, communicates with them directly. They update their status, deliver results, stay in their lane.

And when the user needs to go deep — really deep — into a specific domain, they can talk to a specialist directly. But only when they actually need to. Not as the default workflow. Most of the time, the user talks to one entity who handles everything on their behalf. The agent delegates to specialized workers when needed, monitors their output, and presents results. One conversation thread, full continuity, everything remembered.

But this design creates a specific technical problem that the swarm model doesn't have: context mixing. When the coordinator accumulates conversation across many domains, topics bleed into each other. In my case, this happened within days of heavy use — not months. Financial discussions surface infrastructure details. Family decisions contaminate DevOps answers. Over time, the context window becomes a lossy average of everything. Chroma Research named this "context rot" in 2025.

This is the problem TSVC solves. Not for swarm architectures (they distribute context externally) — for the coordinator model, where one agent holds it all. The OS process metaphor provides the solution: give each topic its own isolated address space, manage switching explicitly, and share only what should be shared.

The OS Metaphor

MemGPT (Packer et al., 2023) used the OS memory model as a foundation for AI agent memory — virtual memory paging, with a limited "physical" context window managed by an LLM-driven memory controller. It was a useful framework. But virtual memory is only half the OS story.

The other half is the process model.

Starting with Atlas (1962) and maturing through Unix (1970s), each process gets its own virtual address space. Context switches between processes save and restore complete state. One process cannot corrupt another's memory. The scheduler manages which process has access to the CPU at any moment.

Apply this to AI agent conversations:

A process maps to a topic (a coherent conversation domain)
A virtual address space maps to topic context (isolated conversation + decisions + state)
A scheduler maps to a topic detector (determines which topic owns the current message)
A context switch maps to a topic switch (save current state → load target state → fresh session)
IPC (inter-process communication) maps to a shared facts layer (SSoT files accessible to all topics)

This isn't just a metaphor — it's a design specification. Each architectural decision in TSVC follows directly from this mapping.

Architecture

The system has three persistent layers and one ephemeral one.

The Kernel Layer (always in context, ~15-20k tokens):
System prompt, identity, tool definitions, agent configuration. Fixed overhead. Does not change between topic switches.

The Topic Awareness Layer (~3k tokens):
A lightweight topic index loaded at every session start. Contains topic IDs, titles, last-active timestamps, and one-line summaries. The agent can see what topics exist and roughly what's in them without loading any context. This keeps awareness overhead minimal while enabling intelligent topic detection.

Per-Topic Context Files (10-85KB each, on disk):
Each topic maintains a context file with:

Active decisions and their dependency chains
Open items and action tracking
Working files and artifacts
Compressed recent exchanges (not raw transcripts)
One-line topic summary for the index

Only the active topic's context file loads into the session. All others stay on disk.

The Shared Facts Layer:
SSoT (Single Source of Truth) files that all topics can read. User profile, long-term memory, facts that span domains. This is the "IPC" that lets topics communicate without contaminating each other's isolated contexts.

Here's the conceptual layout:

┌──────────────────────────────────────────────────────────────┐
│               KERNEL (Always in Context)                     │
│  System prompt, identity, tools, shared facts                │
│  ~15-20k tokens                                              │
└──────────────────────────────────────────────────────────────┘

┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ Topic A      │  │ Topic B      │  │ Topic C      │  │ Topic D      │
│ [ACTIVE]     │  │ [PAGED]      │  │ [PAGED]      │  │ [PAGED]      │
│ ~20KB        │  │ on disk      │  │ on disk      │  │ on disk      │
└──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘

┌──────────────────────────────────────────────────────────────┐
│            Topic Awareness Layer (~3k tokens)                │
│  ID, title, last_active, summary per topic                   │
└──────────────────────────────────────────────────────────────┘

How a Topic Switch Works

Topic switches require a clean session reset to guarantee context isolation. The system handles this transparently:

A message arrives → the gateway plugin runs detect-topic-switch.js before the LLM sees it
Topic match detected → tsvc-switch.sh saves current topic context to disk
A session reset is triggered in the background
The user's next message creates a fresh session → tsvc-boot.sh loads the pending topic context
The agent responds with full target topic context, zero cross-topic contamination

The topic detection script uses fuzzy title matching first, then semantic classification fallback. This keeps topic detection deterministic and free — no LLM tokens required to parse "switch to finance topic."

The core topic manager (tsvc-manager.js) handles the state transitions:

// Save outgoing topic state (inside cmdSwitch)
const prevState = loadTopicState(previousId);
if (prevState) {
  prevState.status = 'paged';
  prevState.lastActive = now;
  saveTopicState(previousId, prevState);
}
// Write context snapshot to disk for boot-time loading
const ctx = renderTopicContext(previousId);
fs.writeFileSync(`tsvc/contexts/${previousId}.md`, ctx);

// On boot: load target topic from pending-reset.json
const pending = JSON.parse(fs.readFileSync('tsvc/pending-reset.json', 'utf8'));
const context = fs.readFileSync(`tsvc/contexts/${pending.toTopic.id}.md`, 'utf8');
fs.unlinkSync('tsvc/pending-reset.json');

The boot sequence runs at every session start:

#!/bin/bash
# tsvc-boot.sh — runs on every session start

PENDING="$TSVC_DIR/pending-reset.json"

if [ ! -f "$PENDING" ]; then
  echo '{"type":"normal_boot"}'
  exit 0
fi

TARGET_TOPIC=$(jq -r '.toTopic.id' "$PENDING")
CONTEXT=$(cat "tsvc/contexts/${TARGET_TOPIC}.md")
rm -f "$PENDING"

jq -n \
  --arg type "topic_switch" \
  --arg targetTopic "$TARGET_TOPIC" \
  --arg context "$CONTEXT" \
  '{type: $type, targetTopic: $targetTopic, context: $context}'

Topic Spawn: Mid-Conversation Splits

Sometimes you're deep in a topic and realize the last twenty exchanges have been their own thing. They deserve their own isolated context going forward. TSVC supports this with Topic Spawn.

User: "This is a new topic — create 'GPU Troubleshooting'"
              │
              ▼
┌────────────────────────┐
│ LLM: scan exchanges    │──────► "Line 47 is where GPU talk started"
│ for semantic boundary  │
└─────────────┬──────────┘
              │
              ▼
┌────────────────────────┐
│ tsvc-spawn.sh          │──────► Create topic, move lines 47+,
│ "GPU Troubleshoot"     │    update counts, refresh contexts
│ 47                     │
└─────────────┬──────────┘
              │
              ▼
┌────────────────────────┐
│ Self-reset             │──────► Next message → fresh session
│ (delete + wait)        │    with new topic loaded
└────────────────────────┘

Protocol:

Trigger: Explicit user request only. No auto-detection (too many false positives; assessed and backlogged)
Semantic boundary: LLM scans conversation.jsonl backwards to find where the new discussion semantically started. Judgment call — no heuristic reliably detects topic drift inflection points
Execute: tsvc-spawn.sh "<title>" <from_line> — creates the new topic and moves exchanges (not copies). Exchange ownership is exclusive; one topic per exchange
Switch: Normal self-reset fires → next message lands in the new topic

Topic Spawn makes three operation types total: spawn, switch, close.

Per-Topic State Management

Each topic maintains a living where-are-we.md file with structured sections managed by tsvc-state.sh:

## Key Facts
- Repo: github.com/MouseRider/skills-tsvc
- Whisper prompt: "tsvc, tsvc-manager, tsvc-boot, tsvc-spawn, OpenClaw, whisper_prompt"

## In Progress
- [ ] Write Show HN post

## Pending Notifications
- [sub-agent:1ab2c3] tsvc-spawn.sh implementation complete

## Recently Completed
- [x] tsvc-vocab.sh v1 merged

## Next Actions
- Review spawn edge cases

tsvc-state.sh operations: show, append, complete, finalize, clear-notifications. The first thing loaded when a topic resumes is this file — it answers "where were we?" without requiring the agent to re-read conversation history.

Topic-Scoped Transcription Vocabulary

Voice input introduces a domain-specific transcription problem. Whisper's accuracy degrades when it encounters specialized terms it hasn't seen in training. A static global vocabulary causes the opposite problem: terms from unrelated topics interfere with each other.

TSVC solves this with per-topic transcription vocabulary:

┌─────────────────┐    ┌──────────────────┐    ┌──────────────┐
│ Topic Switch /  │───►│  tsvc-vocab.sh   │───►│ active-      │
│ Boot            │    │  sync            │    │ whisper-     │
└─────────────────┘    └──────────────────┘    │ prompt.txt   │
                                               └──────┬───────┘
                                                      │
                       ┌────────────────────┐         │
   Audio message ─────►│ tsvc-transcribe    │◄────────┘
                       │ .sh (CLI wrapper)  │
                       │ → OpenAI Whisper   │
                       │ + topic vocabulary │
                       └────────────────────┘

Each topic's where-are-we.md Key Facts section includes a Whisper prompt: field. On topic boot or switch, tsvc-vocab.sh sync updates active-whisper-prompt.txt. tsvc-transcribe.sh passes this to the Whisper API as the prompt parameter.

Result: when you're in the Trading topic, Whisper knows "iron condor" and "theta decay." When you're in DevOps, it knows your infrastructure codenames. Neither vocabulary pollutes the other.

This mirrors a broader pattern in voice-to-agent pipelines: domain-specific vocabulary is a first-class concern. TSVC adds a dimension those approaches lack — per-topic dynamism.

Async Result Routing

Sub-agents complete work asynchronously, often while the user is in a different topic. Without routing, results arrive in the wrong context.

TSVC routes sub-agent results via board task topic: tags:

Sub-agent completes → result arrives in main session
Plugin Phase 0 detects it's a sub-agent message (not user input)
Strategy C routing: extract [task:task_ID] → board tag lookup, or keyword match against topic titles
If topic is paged → file as notification in that topic's where-are-we.md
If topic is active → pass through for normal processing

When the user returns to the paged topic, where-are-we.md shows the pending notification. Nothing is lost because the active topic was somewhere else.

Unified Operations Logging

All TSVC scripts log to tsvc/logs/tsvc-ops.log via a shared tsvc-log.sh helper:

# Every script sources this, passing its own name
source tsvc/scripts/tsvc-log.sh "BOOT"

tsvc_log INFO "Normal boot, active topic: ${ACTIVE_TOPIC}"
tsvc_log INFO "Switch initiated: ${FROM} → ${TO}"
tsvc_log WARN "Boundary detection returned line 0, defaulting to 10"

Format: [2026-03-07 02:23:01 PM PT] [BOOT] [INFO] Normal boot, active topic: trading

When something breaks, the full event trace is one command: tail -100 tsvc/logs/tsvc-ops.log.

Decision Dependency Tracking

Standard context systems store what was decided. TSVC also stores why.

The core insight from the development process: compaction doesn't just lose conversations, it loses causality chains. An agent that remembers "we chose approach X" without remembering "because Y ruled out approaches A and B" will eventually contradict past decisions without realizing it.

Each decision in TSVC can carry dependency links:

// Log a decision with its dependency chain
tsvc-manager.js decision <topic_id> \
  "Use Delete+Wait for session reset" \
  --depends-on "sessions_send deadlocks within active turn"

// Query the full chain
tsvc-manager.js chain <topic_id> <decision_id>
// Output: dec_47eb → "sessions_send deadlocks"
//              └─► dec_b1d9 → "delete+wait confirmed working"
//                       └─► dec_f0f2 → "final self-reset approach"

On context reload after a switch, the dependency chain renders with the root decision's full reasoning path. The agent doesn't need to reconstruct "why did we do this" from fragments — it's preserved explicitly.

This is the highest-ROI component in the system. It directly prevents the failure mode where an agent "improves" something that was deliberately built a specific way for reasons that are no longer visible.

Sentiment Hygiene

Memory that survives context switches should be operationally useful, not emotionally colored. The memory protocol filters emotional valence before facts hit persistent storage:

"Alex seemed frustrated with the Docker setup" → dropped or rewritten
"Docker setup is blocked on X" → kept
"This was a rough debugging session" → dropped
"Root cause: race condition in tsvc-boot.sh line 47" → kept

The mood doesn't follow you into the next session; the operational fact does. Over time this meaningfully reduces context pollution — persistent memory that describes feelings instead of facts is noise.

Comparison to Prior Work

MemGPT addresses the context window size problem through virtual memory paging — offloading older content to external storage and retrieving it on demand. It doesn't address the isolation problem. All topics still share the same paged pool.

Mem0 is excellent for persistent fact extraction across sessions. It handles "remember that I prefer X" and "what did I decide about Y last month" well. It doesn't handle conversational thread isolation — you can't give financial discussions and infrastructure debugging separate, clean windows.

ACON (Zhang et al., 2025) uses observation masking to reduce token usage. It operates within a single context window and doesn't provide cross-topic isolation.

TSVC doesn't replace any of these — it operates at a different layer. Mem0 facts live in the shared facts layer (accessible to all topics). ACON-style filtering applies within each topic's exchange logger. MemGPT's paging could theoretically sit beneath the per-topic context files.

The key gap TSVC fills: no existing system gives each conversation domain its own isolated context that survives independently across sessions.

Cognitive switching costs — 2014 APA research shows 40% productivity loss from task switching. TSVC's topic isolation mirrors this insight: keeping contexts separate reduces the "switching tax" for both the agent and the user.
LangChain context management (Jan 2026) describes offload + summarize strategies for single-task agents. TSVC extends this to multi-topic agents with persistent isolation between them.

Results

Production operation on a Telegram-based persistent assistant (running Claude via OpenClaw on an Intel NUC):

Before TSVC (after 10 days of daily use): 8.5 MB session file, 3,140 lines, 21 global compactions, zero topic isolation.

After TSVC:

Per-topic session files: <350 KB (versus 8.5 MB global)
Compactions per heavy topic per day: 0~1 — keep AI context longer
Topic switches: 126 recorded
Active topics: 12
Exchanges tracked: 3,735
Switch failure rate: <1% (one anomalous 16-minute switch, under investigation)
Median switch time: 31 seconds (V3/current — down from 140s in V2, 77% improvement)
Context load on switch: 10-85 KB (versus 8.5 MB)
Load-to-first-reply: consistently 11-19 seconds

Reduced compaction number is the key result. High compaction number was caused by context rot from topic mixing. TSVC eliminates the cause, so the symptom disappears.

Portability

TSVC is a pattern. The implementation is OpenClaw-specific in one place: the session reset mechanism. Everything else is portable.

Fully portable (zero changes):

Topic context files (markdown on disk)
Topic index (index.json)
Topic detection script (standalone Node.js)
Decision dependency tracking (stored in context files)
Context save/load logic

Platform-specific (adapt per harness):

Session reset: how you clear conversation history and start fresh
Event hooks: where you intercept messages before the LLM
Context injection: how topic context enters the system prompt
Boot hook: how you run initialization code on session start

The template/ directory in the repo contains tsvc_adapter.py — a Python base class with three abstract methods:

class TSVCAdapter(ABC):
    @abstractmethod
    def detect_topic(self, message: str, topics: List[Topic]) -> Optional[Topic]:
        """Return matched topic or None if no switch needed."""
        pass

    @abstractmethod  
    def reset_session(self, pending_context: dict) -> None:
        """Clear conversation history and prepare for fresh session."""
        pass

    @abstractmethod
    def inject_context(self, topic_context: str) -> None:
        """Make topic context available at session start."""
        pass

Any agent harness with file system access, a session reset mechanism, and a startup hook can run TSVC.

Known-compatible platforms: OpenClaw (production), Claude Code / Codex (AGENTS.md + file system), Cursor / Windsurf (rules files + session management), LangChain / LangGraph (checkpointing as natural reset points).

What's Missing

Semantic thread detection. Current context loading grabs the last N exchanges. A smarter approach would detect where the current sub-thread started using time gaps and keyword shift heuristics, then load the semantically relevant exchanges rather than just the most recent ones.

The 16-minute anomaly. One switch took 954 seconds. This is likely session file locking during concurrent cron activity, but it's unconfirmed. For a production system serving multiple users, this tail latency matters.

Lobster integration. The switch pipeline has a Lobster-orchestrated version sitting alongside the bash implementation. It hasn't proven cleaner in practice yet.

Better context management API. Agent architecture should account for stackable context engines to modify context flow.

References

Packer et al., "MemGPT: Towards LLMs as Operating Systems" (2023) — https://arxiv.org/abs/2310.08560
Chroma Research, "Context Rot" (2025) — https://research.trychroma.com/context-rot
Mem0, "Scalable Long-Term Memory for AI Agents" (2025) — https://arxiv.org/abs/2504.19413
Zhang et al., "ACON: Optimizing Context Compression" (2025) — https://arxiv.org/abs/2510.00615
Gümtsch, F.-R., thesis on automatic memory management (1956) — https://networkencyclopedia.com/virtual-memory/
Atlas Computer, IEEE Milestone (1962) — https://ethw.org/Milestones:Atlas_Computer_and_the_Invention_of_Virtual_Memory,_1957-1962
Denning, P.J., "Virtual Memory" (2013) — https://denninginstitute.com/pjd/PUBS/ENC/CRC-vm-2013.pdf
APA, "Multitasking: Switching Costs" — https://www.apa.org/topics/research/multitasking
Rubinstein et al., "Multicosts of Multitasking" (2020) — https://pmc.ncbi.nlm.nih.gov/articles/PMC7075496/
LangChain, "Context Management for Deep Agents" (Jan 2026) — https://blog.langchain.com/context-management-for-deepagents/
Weaviate, "Context Engineering" (2025) — https://weaviate.io/blog/context-engineering
Letta, "Agent Memory Guide" (2024) — https://www.letta.com/blog/agent-memory

Getting Started

git clone https://github.com/MouseRider/skills-tsvc.git
# See docs/integration.md for setup
# See template/ for adapter to your platform
# See docs/architecture.md for full design

TSVC is production software running in a real deployment. The numbers are from real use. It works. It has rough edges. The repo documents both.

Built Feb 25 – Mar 8, 2026. Pull requests welcome.

This space is moving fast. Since shipping TSVC, OpenClaw has introduced a native context-engine plugin API that opens up lower-level hooks into the session lifecycle — I'm currently experimenting with it and looking at what it means for how TSVC manages saves and loads. The shell-script-and-JSON approach got us here, but the architecture may look quite different in another iteration.

If you're running a persistent agent of your own — coordinator model, swarm, anything — I'm curious what context problems you've hit and how you're handling them. And if you've looked at the new context-engine APIs in any platform, I'd especially like to compare notes.