Forem: Sai Vishwak

Portable Agents Are the Missing Abstraction in AI Infrastructure

Sai Vishwak — Tue, 24 Mar 2026 13:38:22 +0000

The AI agent ecosystem has a packaging problem.

Frameworks for building agents have exploded. You can wire up a ReAct loop in a dozen languages, connect it to vector stores, give it tools, and watch it reason. The "how to build an agent" question is largely answered.

But ask a different question — how do you ship one? — and the answers get vague fast.

How do you hand an agent to another team and guarantee it behaves the same way? How do you version it, audit its permissions, constrain its filesystem access, and run it on a machine that has never seen your source code? How do you move it from a developer's laptop to a staging server to a CI pipeline without rewriting configuration at every step?

These are not agent intelligence problems. They are agent infrastructure problems. And they are exactly the problems that portable, bundle-first agent runtimes are built to solve.

The State of Agent Deployment Today

Most agent systems today are application-embedded. The agent's prompt, model configuration, tool definitions, memory strategy, and security policy live scattered across application code, environment variables, config files, and framework-specific abstractions. To "deploy" an agent means deploying the entire application that contains it.

This creates a set of familiar problems:

Environment coupling. The agent works on the author's machine because the right API keys are set, the right files are mounted, and the right tools are available. Move it somewhere else and things break silently — wrong model, missing tools, different filesystem layout.

No versioning boundary. When the agent's behavior changes, what changed? The prompt? The model? A tool implementation? A permission policy? Without a clear artifact boundary, there is no meaningful way to version, diff, or roll back an agent independently of the application around it.

Security as an afterthought. Tool permissions, filesystem access, and network policy are typically enforced (if at all) at the application layer. There is no standard way to declare that an agent should only read files in a workspace, or that Bash commands require human approval, or that outbound network access is restricted to specific hosts.

No portability. An agent built in one framework, with one team's conventions, cannot be handed to another team and run without understanding the full stack underneath it. There is no docker pull equivalent for agents.

These problems compound as organizations move from one experimental agent to dozens of production agents maintained by different teams. The lack of a standard packaging and execution model becomes a real operational bottleneck.

The Container Analogy

The container revolution solved an analogous problem for applications. Before Docker, deploying software meant managing dependencies, environment configuration, and runtime differences across machines. The container image became the unit of packaging — a portable, versioned artifact that could run anywhere a container runtime existed.

Agents need the same thing: a portable artifact that encapsulates identity, behavior, tools, permissions, and runtime policy in a single versioned unit.

This is the idea behind agent bundles.

What an Agent Bundle Looks Like

In Odyssey, the open-source Rust agent runtime built by LiquidOS, an agent bundle is a directory with a small, well-defined structure:

my-agent/
├── odyssey.bundle.json5    # Runtime policy
├── agent.yaml              # Agent identity and behavior
├── skills/                 # Reusable prompt extensions
│   └── code-review/
│       └── SKILL.md
└── resources/              # Bundle-local assets
    └── reference.md

The bundle manifest (odyssey.bundle.json5) declares everything the runtime needs to execute the agent — no implicit dependencies, no environment assumptions:

{
  id: 'code-reviewer',
  version: '1.2.0',
  manifest_version: "odyssey.bundle/v1",
  agent_spec: 'agent.yaml',

  // Execution strategy
  executor: { type: 'prebuilt', id: 'react' },
  memory: { type: 'prebuilt', id: 'sliding_window', config: { max_window: 100 } },

  // Available tools
  tools: [
    { name: 'Read', source: 'builtin' },
    { name: 'Glob', source: 'builtin' },
    { name: 'Grep', source: 'builtin' },
    { name: 'Bash', source: 'builtin' }
  ],

  // Security boundary
  sandbox: {
    mode: 'read_only',
    permissions: {
      filesystem: {
        mounts: { read: ["."], write: [] }
      },
      network: ["api.openai.com"]
    },
    resources: { cpu: 1, memory_mb: 512 }
  }
}

The agent spec (agent.yaml) defines identity, prompt, model, and tool-level permissions:

id: code-reviewer
description: Reviews pull requests for correctness and style
prompt: |
  You are a senior code reviewer. You read diffs carefully, check for
  correctness, identify edge cases, and suggest improvements. You never
  modify files directly — you only provide feedback.
model:
  provider: openai
  name: gpt-4.1-mini
tools:
  allow: ['Read', 'Glob', 'Grep', 'Bash(git diff:*)', 'Bash(git log:*)']
  ask: []
  deny: ['Write', 'Edit', 'Bash']

Read that tools block carefully. This agent can read files and run git diff and git log, but it cannot write files, edit files, or run arbitrary shell commands. That policy is not enforced by convention or application-level checks — it is part of the bundle definition and enforced by the runtime. The sandbox mode is read_only, meaning even if a tool attempted a write, the kernel-level sandbox would block it.

This is what a portable, self-describing agent looks like.

The Lifecycle of a Portable Agent

Once an agent is defined as a bundle, its lifecycle becomes operationally clean:

Author

odyssey-rs init ./code-reviewer
# Edit the manifest, agent spec, and skills

Build and install

odyssey-rs build ./code-reviewer

The build step validates the manifest, resolves dependencies, and installs the bundle into the local bundle store (~/.odyssey/bundles/). The agent is now runnable by reference.

Run

odyssey-rs run code-reviewer@1.2.0 --prompt "Review the latest commit"

The runtime resolves the agent reference, loads the bundle, prepares the sandbox, assembles the execution context, and runs the agent loop. On a release build, this entire initialization — from CLI invocation to agent execution — takes under 200 microseconds.

Distribute

# Export to a portable archive
odyssey-rs export code-reviewer:1.2.0 --output ./dist

# On another machine — import and run
odyssey-rs import ./dist/code-reviewer-1.2.0.odyssey
odyssey-rs run code-reviewer@1.2.0 --prompt "Review this PR"

The .odyssey archive is a self-contained artifact. No source code, no framework installation, no environment setup beyond the Odyssey binary itself. The agent runs identically on any machine with odyssey-rs installed.

Serve remotely

# Start the runtime as an HTTP server
odyssey-rs serve --bind 0.0.0.0:8472

# Run agents remotely
odyssey-rs --remote http://server:8472 run code-reviewer@1.2.0 --prompt "Check main"

The same bundle, the same runtime contract, accessible over HTTP. No separate deployment pipeline.

Why Portability Changes the Game

When agents become portable artifacts, several things that were previously hard become straightforward:

Multi-agent teams without multi-repo chaos

A platform team can author, version, and distribute specialized agents — a code reviewer, a test writer, a documentation generator, an incident responder — as independent bundles. Product teams consume them by reference. Updating an agent means publishing a new version, not coordinating a cross-team deployment.

Auditable security posture

Every bundle explicitly declares what it can and cannot do. An agent with sandbox.mode: read_only and tools.deny: ['Bash'] has a security posture you can read in ten seconds. Compliance teams can review bundle manifests without reading source code. Permission changes are version-controlled diffs.

Reproducible behavior across environments

The same code-reviewer:1.2.0 bundle produces the same execution context on a developer laptop, in CI, and on a production server. The prompt, model, tools, memory strategy, and sandbox policy are fixed by the bundle version. Environment-specific differences (API keys) are injected at runtime, not baked into the artifact.

Agent-as-a-service without infrastructure overhead

The Odyssey HTTP server exposes the full runtime over REST. Any bundle installed on the server is immediately available as an API endpoint. There is no per-agent deployment, no container orchestration, no function-as-a-service wrapper. Install a bundle, and it is servable.

What This Means for the Industry

The shift toward portable, bundle-first agents is not just a developer experience improvement. It represents a fundamental change in how organizations will operate AI agents at scale:

Agents become auditable artifacts. When an agent's complete behavior is captured in a versioned bundle, security reviews, compliance audits, and incident investigations can work with concrete artifacts instead of reconstructing behavior from scattered code and configuration.

Agent distribution becomes a solved problem. The .odyssey archive format and the planned hub push/pull workflow mean agents can be published, discovered, and installed the same way packages and container images are today.

Runtime and agent become independent concerns. Teams that build agents do not need to understand or operate the runtime. Teams that operate the runtime do not need to understand agent internals. The bundle manifest is the contract between them.

Multi-provider, multi-surface deployment becomes trivial. The same bundle runs through CLI for scripting, HTTP for services, and TUI for interactive operation. Switching LLM providers is a configuration change, not an architectural decision.

Star the Repository - https://github.com/liquidos-ai/Odyssey if you like the project

Odyssey is built by LiquidOS. We believe the next generation of AI infrastructure will be defined by portable, auditable, and operationally practical agent runtimes — not larger frameworks.

Benchmarking AI Agent Frameworks in 2026: AutoAgents (Rust) vs LangChain, LangGraph, LlamaIndex, PydanticAI, and more

Sai Vishwak — Wed, 18 Feb 2026 22:16:50 +0000

Why we ran this benchmark

Every AI agent framework claims to be production-ready. Few of them tell you what "production" actually costs in CPU, RAM, and latency. We built AutoAgents — a Rust-native framework for building tool-using AI agents — and wanted to know honestly how it performs against the established Python and Rust players under identical conditions.

This post covers the methodology, the raw numbers, and what we think they mean (and don't mean).

The Task

We picked a task that's representative of real-world agentic workloads: a ReAct-style agent that receives a question, decides to call a tool, processes a parquet file to compute average trip duration, and returns a formatted answer.

This tests:

LLM planning (tool selection)
Tool execution (actual parquet parsing and computation)
Result formatting and response generation

It's not a toy "what's 2+2" benchmark, but it's also a single-step tool call — not a long-horizon multi-agent workflow. We note this limitation upfront.

Setup

Model: gpt-5.1 (same across all frameworks)
Requests: 50 total, 10 concurrent (Hitting TPM Rate beyond, hence limited)
Machine: Same hardware for all runs, no process affinity pinning
Measured: end-to-end latency (P50, P95, P99), throughput (req/s), peak RSS memory (MB), CPU usage (%), cold-start time (ms), determinism rate (same output across runs)

All frameworks achieved 100% success rate (50/50). CrewAI was excluded after it showed a 44% failure rate under the same conditions.

Benchmark code and raw JSON are in the repo: https://github.com/liquidos-ai/autoagents-bench

Results

Framework	Language	Avg Latency	P95 Latency	Throughput	Peak Memory	CPU	Cold Start	Score
AutoAgents	Rust	5,714 ms	9,652 ms	4.97 rps	1,046 MB	29.2%	4 ms	98.03
Rig	Rust	6,065 ms	10,131 ms	4.44 rps	1,019 MB	24.3%	4 ms	90.06
LangChain	Python	6,046 ms	10,209 ms	4.26 rps	5,706 MB	64.0%	62 ms	48.55
PydanticAI	Python	6,592 ms	11,311 ms	4.15 rps	4,875 MB	53.9%	56 ms	48.95
LlamaIndex	Python	6,990 ms	11,960 ms	4.04 rps	4,860 MB	59.7%	54 ms	43.66
GraphBit	JS/TS	8,425 ms	14,388 ms	3.14 rps	4,718 MB	44.6%	138 ms	22.53
LangGraph	Python	10,155 ms	16,891 ms	2.70 rps	5,570 MB	39.7%	63 ms	0.85

Composite score is a weighted, min-max normalized aggregate across all dimensions (latency 27.8%, throughput 33.3%, memory 22.2%, CPU efficiency 16.7%).

Breaking Down the Numbers

Memory: The Biggest Gap

The most striking result isn't latency — it's memory.

AutoAgents peaks at 1,046 MB. The average Python framework peaks at 5,146 MB. That's a ~5× difference on a single-agent workload.

At deployment scale (50 instances):

Framework	Total RAM needed
AutoAgents	~51 GB
Rig	~50 GB
LangChain	~279 GB
LangGraph	~272 GB
PydanticAI	~238 GB
LlamaIndex	~237 GB
GraphBit	~230 GB

Python frameworks carry baseline weight you pay even when idle: interpreter, dependency tree, dynamic dispatch, GC. Rust's ownership model means memory is freed immediately when objects go out of scope — no GC heap to keep around.

Latency: Smaller Gap, Still Real

Latency differences are more nuanced. The LLM network round-trip dominates, which is why all frameworks cluster between 5,700 and 7,000 ms. The outliers (GraphBit at 8,425 ms, LangGraph at 10,155 ms) reflect additional framework orchestration overhead.

AutoAgents beats the average Python framework by 25% on latency, and beats LangGraph by 43.7%.

The P95 numbers diverge more:

AutoAgents P95: 9,652 ms
LangGraph P95: 16,891 ms

At the tail end — the requests that matter most for user-perceived reliability — the gap widens significantly.

Throughput

AutoAgents delivers 4.97 rps vs an average of 3.66 rps across Python frameworks — 36% more throughput under the same concurrency. Against LangGraph specifically, it's 84% more throughput (4.97 vs 2.70 rps).

Higher throughput per instance means you need fewer instances to serve the same load.

Cold Start

This is where Rust's near-zero initialization really shows:

AutoAgents: 4 ms
LangChain: 62 ms (15× slower)
PydanticAI: 56 ms (14× slower)
LlamaIndex: 54 ms (14× slower)
GraphBit: 138 ms (34× slower)
LangGraph: 63 ms (16× slower)

For serverless deployments or auto-scaling scenarios where instances spin up on demand, a 4 ms cold start vs 60–140 ms is a qualitative difference in user experience.

CPU Usage

CPU tells a more nuanced story. Rig (Rust) runs at 24.3% — the most efficient. AutoAgents runs at 29.2%. LangChain leads the Python pack at 64.0%. High CPU means less headroom for burst traffic without throttling.

The throughput-per-CPU efficiency ranking mirrors the composite score.

How We Scored Frameworks

The composite score uses min-max normalization so every dimension is on a consistent 0–1 scale (best = 1, worst = 0), regardless of unit or direction:

score = mmLow(latency)     × 27.8%   # lower is better
      + mmLow(memory)      × 22.2%   # lower is better
      + mmHigh(throughput) × 33.3%   # higher is better
      + mmHigh(cpu_eff)    × 16.7%   # rps/cpu%, higher is better

where mmHigh(v, min, max) = (v - min) / (max - min)
      mmLow(v,  min, max) = (max - v) / (max - min)

Weights reflect what matters at production scale: throughput is the primary capacity driver (33.3%), latency is user-facing (27.8%), memory drives infrastructure cost (22.2%), and CPU efficiency determines burst headroom (16.7%).

What This Benchmark Doesn't Cover

Multi-step agents: We only benchmark single tool-call ReAct loops. Long-horizon planning with many LLM calls may change the picture.
Multi-agent systems: Frameworks designed for agent orchestration (LangGraph, CrewAI) are arguably optimized for complexity we didn't measure.
Answer quality: Determinism rate tracks whether the output is consistent, not whether it's correct by a human rubric.
Streaming: All results are blocking responses. Streaming latency profiles differ.
Different models: These results are specific to gpt-4o-mini. Different models with different token sizes will shift the LLM-dominated portion of latency.

If these gaps are important for your use case, we'd welcome contributions that extend the benchmark suite.

Takeaway

If you're choosing an AI agent framework for a production system where infrastructure cost and reliability under load matter, the memory footprint of Python frameworks is a real constraint. AutoAgents and Rig both stay under 1.1 GB peak — all Python frameworks measured exceeded 4.7 GB.

The throughput and latency advantages are meaningful but not dramatic for single-agent tasks. The memory advantage is 5×, and it's structural — not something you tune away with configuration.

We're continuing to extend the benchmark with more task types, multi-step workflows, and streaming measurements. Issues and PRs welcome.

Give us a star on GitHub: https://github.com/liquidos-ai/AutoAgents

Thanks

Write Agents in Rust — Run Them Locally on Android

Sai Vishwak — Fri, 23 Jan 2026 13:49:17 +0000

Imagine building AI agents that run entirely on your Android device, using local models, without sending any data to the cloud.

Sounds futuristic? It’s here today.

With Rust-powered AutoAgents, you can now:

Write intelligent agents in Rust — fast, safe, and flexible

Deploy them directly on Android — private, on-device, and offline

Use local AI models — no cloud dependency, full control over data

For Developers and Startups

If you’re building AI-native apps, or thinking of launching a privacy-first AI product, this is your playground.

We’re excited to see what developers build when they can write agents in Rust.

AutoAgents is fully open source!

Check it out, try it on your projects, and give feedback.

Github: https://github.com/liquidos-ai/AutoAgents

Android App Example: https://github.com/liquidos-ai/AutoAgents-Android-Example

Happy coding! 🚀

AutoAgents – a Rust-Based Multi-Agent Framework for LLM-Powered Intelligence

Sai Vishwak — Tue, 14 Oct 2025 18:09:21 +0000

AutoAgents is a multi-agent framework built in Rust, designed for performance, safety, and scalability. It enables the creation of intelligent, autonomous agents.
With AutoAgents, you can build Cloud-Native Agents, Edge-Native Agents, or even Hybrid Models. The framework features a modular architecture with swappable components — memory layers, executors, and communication backends can be replaced with minimal effort.

We’re actively developing AutoAgents and would love to get feedback, ideas, and collaborators from the community.