Forem: Ziel

Aionis: AI Agents Don’t Have a Context Problem. They Have an Execution Memory Problem

Ziel — Mon, 16 Mar 2026 13:29:59 +0000

A lot of people think AI agents need more context.

I don’t think that’s the core issue.

I think the real issue is that most agents still don’t have a durable execution memory layer. They can generate. They can retrieve. They can sometimes act. But they still struggle to continue work reliably across sessions, interruptions, and handoffs.

That’s why I built Aionis.

Aionis is not another vector database wrapper. It is not a generic “memory plugin.” It is execution memory infrastructure for coding agents: a system for storing task state, structured handoff, replayable execution artifacts, and policy-linked memory that can actually help an agent continue work.

The reason I think this matters is not just conceptual. We already have public evidence that this layer changes system behavior in measurable ways.

The problem isn’t recall. It’s continuity.
The standard memory story in AI is still too shallow. Store some history, retrieve the relevant bits, inject them back into the prompt, and hope the model stays coherent.

That works for lightweight scenarios.

It breaks down in real ones.

In real coding workflows, the hard part is not “finding relevant text.” The hard part is preserving enough execution structure that the next session, the next runtime, or the next agent can actually pick up the task without re-deriving everything from scratch.

That’s the gap Aionis is designed to close.

And the public numbers already point in that direction.

Public evidence so far
On a continuation benchmark over pallets/click, Aionis reduced:

input tokens by 30.03%
output tokens by 77%
total tokens by 33.24%
That matters because the gain did not come from making the model “smarter.” It came from avoiding repeated context reconstruction in follow-up sessions.

On handoff, the public evidence is even more direct:

cross-runtime recovery improved from 33.33% to 100%
real-repository handoff improved from 0% to 100%
That is the kind of result I care about. Not whether an agent looks impressive in a single run, but whether it can reliably resume and continue when the session boundary changes.

On policy routing, public A/B evidence showed improvement from 0% to 100%, with routing converging toward rg and pytest-focused behavior in real repository workflows. That suggests memory isn’t just useful for recall. It can become part of the control loop.

On replay, strict replay on replay1 and replay2 ran with 0 model tokens. That’s important because it shows some execution paths can be recovered or reproduced without paying model cost again.

On the SDK side, Aionis ran a route-to-SDK audit on 2026-03-14 across 65 non-admin, non-control-plane routes and found no missing public SDK surface in either the TypeScript SDK or the Python SDK.

So this is not just an idea anymore. There is already a public implementation and public evidence across continuation, handoff, replay, policy, and SDK coverage.

Why this is different from “more context”
Bigger context windows help. Retrieval helps. But neither one is the same as execution memory.

Context is what the model can see right now.
Execution memory is what the system can rely on later.

That difference matters a lot.

A system can retrieve the right text and still fail to resume a task properly. It can remember facts but lose decisions. It can preserve notes but fail to preserve execution state.

Aionis is built around the idea that agents need memory of work, not just memory of content.

That includes:

what happened
what artifacts were produced
what state the task is in
what handoff is available
what can be replayed
what policy signals should influence the next step
That’s why I increasingly think “execution memory” is the right category. It is more precise than “agent memory,” and much more useful than just saying “context management.”

My bet
My bet is simple:

The next wave of useful agents won’t be defined by who can stuff the most context into a prompt. They’ll be defined by who can build systems that preserve state, enforce control, and carry execution forward reliably.

That means memory has to become infrastructure.

Not memory in the loose sense of “things the model once saw.”
Memory in the hard sense of “state the runtime can operate on.”

That’s what Aionis is trying to build.

And if the current public results keep holding, I think this layer is going to matter much more than most people expect.

Replayable Execution Memory for AI Agents: Building Aionis

Ziel — Fri, 06 Mar 2026 15:00:47 +0000

Aionis: Replayable Execution Memory for AI Agents

Large language models are getting extremely good at reasoning.

Agents built on top of them can plan tasks, call tools, and automate workflows. But there is still a major limitation in most agent systems today:

Agents don’t remember how work gets done.

They remember conversations.
They remember embeddings.
But they rarely remember execution.

Every time an agent performs a task, it often needs to reason through the entire workflow again.

This leads to:
• high token usage
• slow execution
• unstable results

We kept running into the same problem while building agent workflows. Even after an agent successfully completed a task, the next run still required the model to re-plan everything.

So we built something different.

From Conversation Memory to Execution Memory

Most agent memory systems store text.

Typical examples include:
• chat history
• vector embeddings
• entity memory
• preference storage

These help agents recall information, but they don’t help agents reuse workflows.

Aionis focuses on a different kind of memory:

execution memory.

Instead of storing what the agent said, Aionis records how the agent completed a task.

The lifecycle looks like this:

Agent Run
↓
Execution Trace
↓
Compile Playbook
↓
Replay Workflow

Once a workflow succeeds, it can be replayed later.

Instead of asking the model to reason again, the system executes the compiled workflow.

Why This Matters

Consider a typical agent task like setting up a development environment.

Without execution memory, every run looks like this:

User request
↓
LLM reasoning
↓
Tool planning
↓
Execution

Even if the exact same task has already succeeded before.

With execution memory, the process becomes:

First run:
LLM reasoning → execution trace → playbook

Later runs:
Replay playbook

This dramatically reduces overhead and increases stability.

What Replay Actually Means

Replay in Aionis is not token replay.

It does not attempt to reproduce the exact LLM token sequence.

Instead, Aionis replays actions and artifacts.

Typical replay steps include:
• shell commands
• tool invocations
• file operations
• environment changes

This makes replay deterministic and avoids the complexity of reproducing model reasoning.

The goal is not to replay thoughts.

The goal is to replay execution.

Replay Modes

Aionis provides three replay modes depending on how strict you want execution to be.

simulate

Simulation mode performs validation without executing commands.

This checks:
• environment readiness
• dependencies
• preconditions

Useful for auditing workflows.

strict

Strict mode executes the workflow exactly as recorded.

If any step fails, execution stops immediately.

This is useful for deterministic automation.

guided

Guided mode executes workflows but allows repair suggestions if failures occur.

Repair patches can be generated through:
• heuristics
• external synthesis services
• optional LLM assistance

However, repairs are not automatically applied.

Governance and Safety

Automation systems require governance.

Aionis follows an audit-first design philosophy.

Typical repair flow:

guided run
↓
repair suggestion
↓
human review
↓
shadow validation
↓
promotion

By default:
• repairs require review
• validation happens in shadow mode
• workflows are not automatically promoted

This keeps automation systems observable and controllable.

Benchmark

In a simple workflow benchmark, we observed the following performance:

Baseline reasoning execution:

≈ 2.3 seconds

Replay execution:

≈ 0.27 seconds

Warm replay:

≈ 0.11 seconds

This corresponds to roughly:

8× – 20× speed improvement

Replay stability across 100 runs was approximately:

≈98%

While this is not a universal benchmark, it demonstrates how replayable execution can significantly reduce latency and token usage.

Where Aionis Fits in the Agent Stack

Modern agent systems increasingly resemble layered architectures.

LLM
↓
Agent Planner
↓
Execution Memory
↓
Tools / Environment

LLMs provide reasoning.

Agent frameworks orchestrate tasks.

Aionis provides the execution memory layer that allows agents to reuse successful workflows.

OpenClaw Integration

We also built an OpenClaw plugin to integrate agents with Aionis.

Once installed, the agent automatically records execution traces and can replay workflows later.

Install:

openclaw plugins install @aionis/openclaw
openclaw aionis-memory bootstrap
openclaw aionis-memory selfcheck

Final Thoughts

Large context windows and powerful models are making agents increasingly capable.

But reasoning alone does not create reliable automation.

Execution memory allows agents to gradually accumulate procedural knowledge.

Instead of solving the same problem repeatedly, agents can reuse workflows that have already succeeded.

That’s the idea behind Aionis.

If you’re building agent systems, AI copilots, MCP tools, or LangGraph pipelines, we’d love to hear your feedback.

GitHub:https://github.com/Cognary/Aionis
Docs:https://doc.aionisos.com