Why we built nested steerable tool loops for AI agents

Daniel Lenton — Mon, 30 Mar 2026 20:12:10 +0000

Here's something that's bothered us for the past year: the moment you ask an AI agent to do something, it disappears. You prompt, you wait, you get a result. If you realise halfway through that you forgot to mention something, you cancel and start over. If you want to know what it's doing, you can't. If something more urgent comes up, you can't pause it and come back later.

This is a fundamental limitation of how agent frameworks are built. One LLM, one loop, one tool call at a time. The model picks a tool, calls it, reads the result, picks the next tool. There's no interface for the outside world to interact with the loop while it's running.

We needed something different. We're building AI assistants that you onboard like new hires — share your screen, walk them through your tools, hop on a call. They need to be doing things while you're talking to them. They need to handle "actually, also check train options" without starting over.

So we built steerable tool loops. Today we're open-sourcing the engine under MIT: github.com/unifyai/unity

Every operation returns a handle

This is the core idea. When you ask the assistant to do something, you don't get a promise that eventually resolves. You get a live handle:

handle = await actor.act("Research flights to Tokyo and draft an itinerary")

# Twenty seconds later, while it's still working:
await handle.interject("Also check train options from Tokyo to Osaka")

# Something urgent comes up:
await handle.pause()
# ... deal with it ...
await handle.resume()

# Or just ask what's happening:
status = await handle.ask("Have you found anything under $800?")

ask, interject, pause, resume, stop. That's the interface. Every operation in the system returns one of these — from a simple contact lookup to a multi-hour task execution.

Handles nest

This is where it gets interesting. The assistant isn't one loop. It's a hierarchy of them.

The Actor receives your request and writes a Python program that calls typed primitives — await primitives.contacts.ask(...), await primitives.knowledge.update(...). Each of those calls starts its own LLM tool loop inside the relevant manager, which returns its own handle.

handle.pause()
 │
 ▼
Actor (pauses)
 ├── ContactManager.ask (pauses)
 │    └── inner search operation (pauses)
 └── KnowledgeManager.update (pauses)
      └── inner write operation (pauses)

All layers pause. Resume propagates the same way. So does stop.

You can steer a complex multi-step operation at any depth without knowing or caring about the internal structure. Pause the whole thing, or ask a specific sub-operation what it's doing.

What this actually enables

Talk to your assistant while it works. The system has a dual-brain architecture: a slow deliberation brain that sees the full picture and makes decisions, plus a fast real-time voice agent (on LiveKit) that handles the conversation at sub-second latency. They communicate over IPC. When the slow brain finishes a background task, it tells the fast brain to weave the results into whatever you're currently discussing. You never have to wait in silence.

Redirect mid-task. "Actually, don't send that email — call them instead." The interject mechanism injects new instructions into the running loop between LLM turns. If an LLM call is already in flight, it's cancelled and restarted with the interjection included. No restart, no lost context.

Run multiple things at once. The conversation manager tracks concurrent in-flight actions, each with its own steerable handle. You can say "how's the flight search going?" and it routes to the right handle's ask() method, while the other operations keep running.

Memory that doesn't reset. Every ~50 messages, a background process extracts contacts, relationships, domain knowledge, and task commitments into structured, queryable tables. After a month, the assistant has a working model of your world — not a chat log, but typed records it can filter, join, and search.

The code

The system has been in development for ~10 months. We're a YC company (Unify) and this powers our commercial product. The brain is the open-source part.

If you want to see how it works, start here:``

unity/common/async_tool_loop.py — the SteerableToolHandle protocol and AsyncToolLoopHandle implementation
unity/common/_async_tool/loop.py — the loop engine: interjections, pausing, parallel tool execution, context compression
ARCHITECTURE.md — the full technical walkthrough

We'd genuinely appreciate feedback — what we got right, what seems over-engineered, what you'd do differently. This is a complex system and outside perspective is valuable.

GitHub · Launch video · Website

We Built a Dynamic Router Improving LLM Quality, Cost and Speed ✨

Daniel Lenton — Wed, 22 May 2024 15:10:58 +0000

Are you also overwhelmed by all the LLM models and providers constantly coming onto the scene? To me it sometimes feels like trying to drink from a firehose, especially when it comes to aligning with my own specific task and prompts. Chosing the wrong model for your task means slower, more expensive, and less competent models, which nobody wants 🫠

The Common Dilemma

The AI landscape is cluttered with options like Llama, Gemini, GPT, and Mistral, leading to a common scenario:

Dynamic Routing with Unify ✨

Before you roll your eyes at yet another buzzword, let me try to explain what we've built in a bit more detail. Basically, with Unify, you don't have to manually test each model against your requirements or juggle multiple accounts and API keys. All models are available with a single API key, and you can easily benchmark your prompts to assess which LLMs and providers are best for your own task.

Unify can also automatically route your prompts to the most suitable LLM based on your preferences for quality, speed, and cost. This means you can focus on what truly matters - building your exceptional AI-driven applications 🔥

Feel free to check out a more comprehensive walkthrough.

So, high-level, what does Unify bring to the table?

⚙️ Control: Choose which models and providers you want to route to and then adjust how important quality, cost, and latency are for you. That's it; now the performance of your LLM app is fully in your hands, not the providers!
📈 Self Improvement: As each new model and provider comes onto the scene, sit back and watch your LLM application automatically improve over time. We quickly add support for the latest and greatest, ensuring your custom cost-quality-speed requirements are always fully optimized.
📊 Observability: Don't want to route? No sweat. Quickly compare all models and providers, and see which are truly the best for your own needs, on your own prompts, for your own task.
⚖️ Impartiality: We treat all models and providers equally, as we don't have a horse in the race. You can trust our benchmarks.
🔑 Convenience: The power of all models and providers behind a single endpoint, queryable individually or via the router, all with a single API key. 'pip install unifyai', and away you go!
🧑‍💻 Focus: Don't stress updating the model and provider every few weeks. Just specify your performance needs and get back to building great AI products. We'll handle the rest for you!

Getting Started is a Breeze:

pip install unifyai

from unify import Unify

unify = Unify(
    api_key=("UNIFY_KEY"),
    endpoint="router@q:1",
)

response = unify.generate(user_prompt="Hello there")

It's that simple 👌

Why We Think You'll Like Unify:

🎨Focus on Development: Spend more time creating and less time worrying about finding the most appropriate LLM.
⚙️Adaptive and Efficient: Your app will self-improve as you automatically benchmark each new LLM on your own prompts and for your own task, enabling you to quickly integrate the latest and greatest LLMs into your workflow.
⚖️Quality, Cost and Speed: These are the three pillars for all LLMs. Unify's router ensures you never have to compromise on any of them.

Forem: Daniel Lenton