<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sai Vishwak</title>
    <description>The latest articles on Forem by Sai Vishwak (@saivishwak).</description>
    <link>https://forem.com/saivishwak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1257047%2F17ba20cd-1321-419b-b98c-ace0b10996c8.jpeg</url>
      <title>Forem: Sai Vishwak</title>
      <link>https://forem.com/saivishwak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/saivishwak"/>
    <language>en</language>
    <item>
      <title>Portable Agents Are the Missing Abstraction in AI Infrastructure</title>
      <dc:creator>Sai Vishwak</dc:creator>
      <pubDate>Tue, 24 Mar 2026 13:38:22 +0000</pubDate>
      <link>https://forem.com/saivishwak/portable-agents-are-the-missing-abstraction-in-ai-infrastructure-341n</link>
      <guid>https://forem.com/saivishwak/portable-agents-are-the-missing-abstraction-in-ai-infrastructure-341n</guid>
      <description>&lt;p&gt;The AI agent ecosystem has a packaging problem.&lt;/p&gt;

&lt;p&gt;Frameworks for &lt;em&gt;building&lt;/em&gt; agents have exploded. You can wire up a ReAct loop in a dozen languages, connect it to vector stores, give it tools, and watch it reason. The "how to build an agent" question is largely answered.&lt;/p&gt;

&lt;p&gt;But ask a different question — &lt;em&gt;how do you ship one?&lt;/em&gt; — and the answers get vague fast.&lt;/p&gt;

&lt;p&gt;How do you hand an agent to another team and guarantee it behaves the same way? How do you version it, audit its permissions, constrain its filesystem access, and run it on a machine that has never seen your source code? How do you move it from a developer's laptop to a staging server to a CI pipeline without rewriting configuration at every step?&lt;/p&gt;

&lt;p&gt;These are not agent intelligence problems. They are &lt;strong&gt;agent infrastructure problems&lt;/strong&gt;. And they are exactly the problems that portable, bundle-first agent runtimes are built to solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  The State of Agent Deployment Today
&lt;/h2&gt;

&lt;p&gt;Most agent systems today are &lt;em&gt;application-embedded&lt;/em&gt;. The agent's prompt, model configuration, tool definitions, memory strategy, and security policy live scattered across application code, environment variables, config files, and framework-specific abstractions. To "deploy" an agent means deploying the entire application that contains it.&lt;/p&gt;

&lt;p&gt;This creates a set of familiar problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment coupling.&lt;/strong&gt; The agent works on the author's machine because the right API keys are set, the right files are mounted, and the right tools are available. Move it somewhere else and things break silently — wrong model, missing tools, different filesystem layout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No versioning boundary.&lt;/strong&gt; When the agent's behavior changes, what changed? The prompt? The model? A tool implementation? A permission policy? Without a clear artifact boundary, there is no meaningful way to version, diff, or roll back an agent independently of the application around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security as an afterthought.&lt;/strong&gt; Tool permissions, filesystem access, and network policy are typically enforced (if at all) at the application layer. There is no standard way to declare that an agent should only read files in a workspace, or that &lt;code&gt;Bash&lt;/code&gt; commands require human approval, or that outbound network access is restricted to specific hosts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No portability.&lt;/strong&gt; An agent built in one framework, with one team's conventions, cannot be handed to another team and run without understanding the full stack underneath it. There is no &lt;code&gt;docker pull&lt;/code&gt; equivalent for agents.&lt;/p&gt;

&lt;p&gt;These problems compound as organizations move from one experimental agent to dozens of production agents maintained by different teams. The lack of a standard packaging and execution model becomes a real operational bottleneck.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Container Analogy
&lt;/h2&gt;

&lt;p&gt;The container revolution solved an analogous problem for applications. Before Docker, deploying software meant managing dependencies, environment configuration, and runtime differences across machines. The container image became the unit of packaging — a portable, versioned artifact that could run anywhere a container runtime existed.&lt;/p&gt;

&lt;p&gt;Agents need the same thing: &lt;strong&gt;a portable artifact that encapsulates identity, behavior, tools, permissions, and runtime policy in a single versioned unit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the idea behind agent bundles.&lt;/p&gt;




&lt;h2&gt;
  
  
  What an Agent Bundle Looks Like
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://github.com/liquidos-ai/odyssey" rel="noopener noreferrer"&gt;Odyssey&lt;/a&gt;, the open-source Rust agent runtime built by LiquidOS, an agent bundle is a directory with a small, well-defined structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-agent/
├── odyssey.bundle.json5    # Runtime policy
├── agent.yaml              # Agent identity and behavior
├── skills/                 # Reusable prompt extensions
│   └── code-review/
│       └── SKILL.md
└── resources/              # Bundle-local assets
    └── reference.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;bundle manifest&lt;/strong&gt; (&lt;code&gt;odyssey.bundle.json5&lt;/code&gt;) declares everything the runtime needs to execute the agent — no implicit dependencies, no environment assumptions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  id: 'code-reviewer',
  version: '1.2.0',
  manifest_version: "odyssey.bundle/v1",
  agent_spec: 'agent.yaml',

  // Execution strategy
  executor: { type: 'prebuilt', id: 'react' },
  memory: { type: 'prebuilt', id: 'sliding_window', config: { max_window: 100 } },

  // Available tools
  tools: [
    { name: 'Read', source: 'builtin' },
    { name: 'Glob', source: 'builtin' },
    { name: 'Grep', source: 'builtin' },
    { name: 'Bash', source: 'builtin' }
  ],

  // Security boundary
  sandbox: {
    mode: 'read_only',
    permissions: {
      filesystem: {
        mounts: { read: ["."], write: [] }
      },
      network: ["api.openai.com"]
    },
    resources: { cpu: 1, memory_mb: 512 }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;agent spec&lt;/strong&gt; (&lt;code&gt;agent.yaml&lt;/code&gt;) defines identity, prompt, model, and tool-level permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;code-reviewer&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Reviews pull requests for correctness and style&lt;/span&gt;
&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
  &lt;span class="s"&gt;You are a senior code reviewer. You read diffs carefully, check for&lt;/span&gt;
  &lt;span class="s"&gt;correctness, identify edge cases, and suggest improvements. You never&lt;/span&gt;
  &lt;span class="s"&gt;modify files directly — you only provide feedback.&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4.1-mini&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Read'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Glob'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Grep'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bash(git&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;diff:*)'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bash(git&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;log:*)'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
  &lt;span class="na"&gt;deny&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Write'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Edit'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bash'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read that &lt;code&gt;tools&lt;/code&gt; block carefully. This agent can read files and run &lt;code&gt;git diff&lt;/code&gt; and &lt;code&gt;git log&lt;/code&gt;, but it &lt;em&gt;cannot&lt;/em&gt; write files, edit files, or run arbitrary shell commands. That policy is not enforced by convention or application-level checks — it is part of the bundle definition and enforced by the runtime. The sandbox mode is &lt;code&gt;read_only&lt;/code&gt;, meaning even if a tool attempted a write, the kernel-level sandbox would block it.&lt;/p&gt;

&lt;p&gt;This is what a portable, self-describing agent looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lifecycle of a Portable Agent
&lt;/h2&gt;

&lt;p&gt;Once an agent is defined as a bundle, its lifecycle becomes operationally clean:&lt;/p&gt;

&lt;h3&gt;
  
  
  Author
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;odyssey-rs init ./code-reviewer
&lt;span class="c"&gt;# Edit the manifest, agent spec, and skills&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Build and install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;odyssey-rs build ./code-reviewer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The build step validates the manifest, resolves dependencies, and installs the bundle into the local bundle store (&lt;code&gt;~/.odyssey/bundles/&lt;/code&gt;). The agent is now runnable by reference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;odyssey-rs run code-reviewer@1.2.0 &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Review the latest commit"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime resolves the agent reference, loads the bundle, prepares the sandbox, assembles the execution context, and runs the agent loop. On a release build, this entire initialization — from CLI invocation to agent execution — takes &lt;strong&gt;under 200 microseconds&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distribute
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export to a portable archive&lt;/span&gt;
odyssey-rs &lt;span class="nb"&gt;export &lt;/span&gt;code-reviewer:1.2.0 &lt;span class="nt"&gt;--output&lt;/span&gt; ./dist

&lt;span class="c"&gt;# On another machine — import and run&lt;/span&gt;
odyssey-rs import ./dist/code-reviewer-1.2.0.odyssey
odyssey-rs run code-reviewer@1.2.0 &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Review this PR"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.odyssey&lt;/code&gt; archive is a self-contained artifact. No source code, no framework installation, no environment setup beyond the Odyssey binary itself. The agent runs identically on any machine with &lt;code&gt;odyssey-rs&lt;/code&gt; installed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serve remotely
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the runtime as an HTTP server&lt;/span&gt;
odyssey-rs serve &lt;span class="nt"&gt;--bind&lt;/span&gt; 0.0.0.0:8472

&lt;span class="c"&gt;# Run agents remotely&lt;/span&gt;
odyssey-rs &lt;span class="nt"&gt;--remote&lt;/span&gt; http://server:8472 run code-reviewer@1.2.0 &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Check main"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same bundle, the same runtime contract, accessible over HTTP. No separate deployment pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Portability Changes the Game
&lt;/h2&gt;

&lt;p&gt;When agents become portable artifacts, several things that were previously hard become straightforward:&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-agent teams without multi-repo chaos
&lt;/h3&gt;

&lt;p&gt;A platform team can author, version, and distribute specialized agents — a code reviewer, a test writer, a documentation generator, an incident responder — as independent bundles. Product teams consume them by reference. Updating an agent means publishing a new version, not coordinating a cross-team deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auditable security posture
&lt;/h3&gt;

&lt;p&gt;Every bundle explicitly declares what it can and cannot do. An agent with &lt;code&gt;sandbox.mode: read_only&lt;/code&gt; and &lt;code&gt;tools.deny: ['Bash']&lt;/code&gt; has a security posture you can read in ten seconds. Compliance teams can review bundle manifests without reading source code. Permission changes are version-controlled diffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reproducible behavior across environments
&lt;/h3&gt;

&lt;p&gt;The same &lt;code&gt;code-reviewer:1.2.0&lt;/code&gt; bundle produces the same execution context on a developer laptop, in CI, and on a production server. The prompt, model, tools, memory strategy, and sandbox policy are fixed by the bundle version. Environment-specific differences (API keys) are injected at runtime, not baked into the artifact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent-as-a-service without infrastructure overhead
&lt;/h3&gt;

&lt;p&gt;The Odyssey HTTP server exposes the full runtime over REST. Any bundle installed on the server is immediately available as an API endpoint. There is no per-agent deployment, no container orchestration, no function-as-a-service wrapper. Install a bundle, and it is servable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for the Industry
&lt;/h2&gt;

&lt;p&gt;The shift toward portable, bundle-first agents is not just a developer experience improvement. It represents a fundamental change in how organizations will operate AI agents at scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents become auditable artifacts.&lt;/strong&gt; When an agent's complete behavior is captured in a versioned bundle, security reviews, compliance audits, and incident investigations can work with concrete artifacts instead of reconstructing behavior from scattered code and configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent distribution becomes a solved problem.&lt;/strong&gt; The &lt;code&gt;.odyssey&lt;/code&gt; archive format and the planned hub push/pull workflow mean agents can be published, discovered, and installed the same way packages and container images are today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime and agent become independent concerns.&lt;/strong&gt; Teams that build agents do not need to understand or operate the runtime. Teams that operate the runtime do not need to understand agent internals. The bundle manifest is the contract between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-provider, multi-surface deployment becomes trivial.&lt;/strong&gt; The same bundle runs through CLI for scripting, HTTP for services, and TUI for interactive operation. Switching LLM providers is a configuration change, not an architectural decision.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Star the Repository - &lt;a href="https://github.com/liquidos-ai/Odyssey" rel="noopener noreferrer"&gt;https://github.com/liquidos-ai/Odyssey&lt;/a&gt; if you like the project&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Odyssey is built by &lt;a href="https://liquidos.ai" rel="noopener noreferrer"&gt;LiquidOS&lt;/a&gt;. We believe the next generation of AI infrastructure will be defined by portable, auditable, and operationally practical agent runtimes — not larger frameworks.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>Benchmarking AI Agent Frameworks in 2026: AutoAgents (Rust) vs LangChain, LangGraph, LlamaIndex, PydanticAI, and more</title>
      <dc:creator>Sai Vishwak</dc:creator>
      <pubDate>Wed, 18 Feb 2026 22:16:50 +0000</pubDate>
      <link>https://forem.com/saivishwak/benchmarking-ai-agent-frameworks-in-2026-autoagents-rust-vs-langchain-langgraph-llamaindex-338f</link>
      <guid>https://forem.com/saivishwak/benchmarking-ai-agent-frameworks-in-2026-autoagents-rust-vs-langchain-langgraph-llamaindex-338f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s6xyztn0g76zttbiphm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s6xyztn0g76zttbiphm.png" alt=" " width="800" height="939"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why we ran this benchmark
&lt;/h3&gt;

&lt;p&gt;Every AI agent framework claims to be production-ready. Few of them tell you what "production" actually costs in CPU, RAM, and latency. We built AutoAgents — a Rust-native framework for building tool-using AI agents — and wanted to know honestly how it performs against the established Python and Rust players under identical conditions.&lt;/p&gt;

&lt;p&gt;This post covers the methodology, the raw numbers, and what we think they mean (and don't mean).&lt;/p&gt;




&lt;h3&gt;
  
  
  The Task
&lt;/h3&gt;

&lt;p&gt;We picked a task that's representative of real-world agentic workloads: a &lt;strong&gt;ReAct-style agent&lt;/strong&gt; that receives a question, decides to call a tool, processes a parquet file to compute average trip duration, and returns a formatted answer.&lt;/p&gt;

&lt;p&gt;This tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM planning (tool selection)&lt;/li&gt;
&lt;li&gt;Tool execution (actual parquet parsing and computation)&lt;/li&gt;
&lt;li&gt;Result formatting and response generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's not a toy "what's 2+2" benchmark, but it's also a single-step tool call — not a long-horizon multi-agent workflow. We note this limitation upfront.&lt;/p&gt;




&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model:&lt;/strong&gt; &lt;code&gt;gpt-5.1&lt;/code&gt; (same across all frameworks)&lt;br&gt;
&lt;strong&gt;Requests:&lt;/strong&gt; 50 total, 10 concurrent (Hitting TPM Rate beyond, hence limited)&lt;br&gt;
&lt;strong&gt;Machine:&lt;/strong&gt; Same hardware for all runs, no process affinity pinning&lt;br&gt;
&lt;strong&gt;Measured:&lt;/strong&gt; end-to-end latency (P50, P95, P99), throughput (req/s), peak RSS memory (MB), CPU usage (%), cold-start time (ms), determinism rate (same output across runs)&lt;/p&gt;

&lt;p&gt;All frameworks achieved &lt;strong&gt;100% success rate&lt;/strong&gt; (50/50). CrewAI was excluded after it showed a 44% failure rate under the same conditions.&lt;/p&gt;

&lt;p&gt;Benchmark code and raw JSON are in the repo: &lt;a href="https://github.com/liquidos-ai/autoagents-bench" rel="noopener noreferrer"&gt;https://github.com/liquidos-ai/autoagents-bench&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P95 Latency&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Peak Memory&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;Cold Start&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AutoAgents&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;5,714 ms&lt;/td&gt;
&lt;td&gt;9,652 ms&lt;/td&gt;
&lt;td&gt;4.97 rps&lt;/td&gt;
&lt;td&gt;1,046 MB&lt;/td&gt;
&lt;td&gt;29.2%&lt;/td&gt;
&lt;td&gt;4 ms&lt;/td&gt;
&lt;td&gt;98.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rig&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;6,065 ms&lt;/td&gt;
&lt;td&gt;10,131 ms&lt;/td&gt;
&lt;td&gt;4.44 rps&lt;/td&gt;
&lt;td&gt;1,019 MB&lt;/td&gt;
&lt;td&gt;24.3%&lt;/td&gt;
&lt;td&gt;4 ms&lt;/td&gt;
&lt;td&gt;90.06&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;6,046 ms&lt;/td&gt;
&lt;td&gt;10,209 ms&lt;/td&gt;
&lt;td&gt;4.26 rps&lt;/td&gt;
&lt;td&gt;5,706 MB&lt;/td&gt;
&lt;td&gt;64.0%&lt;/td&gt;
&lt;td&gt;62 ms&lt;/td&gt;
&lt;td&gt;48.55&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PydanticAI&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;6,592 ms&lt;/td&gt;
&lt;td&gt;11,311 ms&lt;/td&gt;
&lt;td&gt;4.15 rps&lt;/td&gt;
&lt;td&gt;4,875 MB&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;56 ms&lt;/td&gt;
&lt;td&gt;48.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LlamaIndex&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;6,990 ms&lt;/td&gt;
&lt;td&gt;11,960 ms&lt;/td&gt;
&lt;td&gt;4.04 rps&lt;/td&gt;
&lt;td&gt;4,860 MB&lt;/td&gt;
&lt;td&gt;59.7%&lt;/td&gt;
&lt;td&gt;54 ms&lt;/td&gt;
&lt;td&gt;43.66&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphBit&lt;/td&gt;
&lt;td&gt;JS/TS&lt;/td&gt;
&lt;td&gt;8,425 ms&lt;/td&gt;
&lt;td&gt;14,388 ms&lt;/td&gt;
&lt;td&gt;3.14 rps&lt;/td&gt;
&lt;td&gt;4,718 MB&lt;/td&gt;
&lt;td&gt;44.6%&lt;/td&gt;
&lt;td&gt;138 ms&lt;/td&gt;
&lt;td&gt;22.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;10,155 ms&lt;/td&gt;
&lt;td&gt;16,891 ms&lt;/td&gt;
&lt;td&gt;2.70 rps&lt;/td&gt;
&lt;td&gt;5,570 MB&lt;/td&gt;
&lt;td&gt;39.7%&lt;/td&gt;
&lt;td&gt;63 ms&lt;/td&gt;
&lt;td&gt;0.85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Composite score&lt;/strong&gt; is a weighted, min-max normalized aggregate across all dimensions (latency 27.8%, throughput 33.3%, memory 22.2%, CPU efficiency 16.7%).&lt;/p&gt;
&lt;h3&gt;
  
  
  Breaking Down the Numbers
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Memory: The Biggest Gap
&lt;/h4&gt;

&lt;p&gt;The most striking result isn't latency — it's memory.&lt;/p&gt;

&lt;p&gt;AutoAgents peaks at &lt;strong&gt;1,046 MB&lt;/strong&gt;. The average Python framework peaks at &lt;strong&gt;5,146 MB&lt;/strong&gt;. That's a &lt;strong&gt;~5× difference&lt;/strong&gt; on a single-agent workload.&lt;/p&gt;

&lt;p&gt;At deployment scale (50 instances):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Total RAM needed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AutoAgents&lt;/td&gt;
&lt;td&gt;~51 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rig&lt;/td&gt;
&lt;td&gt;~50 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;~279 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;~272 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PydanticAI&lt;/td&gt;
&lt;td&gt;~238 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LlamaIndex&lt;/td&gt;
&lt;td&gt;~237 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphBit&lt;/td&gt;
&lt;td&gt;~230 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Python frameworks carry baseline weight you pay even when idle: interpreter, dependency tree, dynamic dispatch, GC. Rust's ownership model means memory is freed immediately when objects go out of scope — no GC heap to keep around.&lt;/p&gt;
&lt;h4&gt;
  
  
  Latency: Smaller Gap, Still Real
&lt;/h4&gt;

&lt;p&gt;Latency differences are more nuanced. The LLM network round-trip dominates, which is why all frameworks cluster between 5,700 and 7,000 ms. The outliers (GraphBit at 8,425 ms, LangGraph at 10,155 ms) reflect additional framework orchestration overhead.&lt;/p&gt;

&lt;p&gt;AutoAgents beats the &lt;strong&gt;average Python framework by 25%&lt;/strong&gt; on latency, and beats LangGraph by &lt;strong&gt;43.7%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The P95 numbers diverge more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AutoAgents P95: 9,652 ms&lt;/li&gt;
&lt;li&gt;LangGraph P95: 16,891 ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the tail end — the requests that matter most for user-perceived reliability — the gap widens significantly.&lt;/p&gt;
&lt;h4&gt;
  
  
  Throughput
&lt;/h4&gt;

&lt;p&gt;AutoAgents delivers &lt;strong&gt;4.97 rps&lt;/strong&gt; vs an average of &lt;strong&gt;3.66 rps&lt;/strong&gt; across Python frameworks — &lt;strong&gt;36% more throughput&lt;/strong&gt; under the same concurrency. Against LangGraph specifically, it's &lt;strong&gt;84% more throughput&lt;/strong&gt; (4.97 vs 2.70 rps).&lt;/p&gt;

&lt;p&gt;Higher throughput per instance means you need fewer instances to serve the same load.&lt;/p&gt;
&lt;h4&gt;
  
  
  Cold Start
&lt;/h4&gt;

&lt;p&gt;This is where Rust's near-zero initialization really shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AutoAgents: &lt;strong&gt;4 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;LangChain: &lt;strong&gt;62 ms&lt;/strong&gt; (15× slower)&lt;/li&gt;
&lt;li&gt;PydanticAI: &lt;strong&gt;56 ms&lt;/strong&gt; (14× slower)&lt;/li&gt;
&lt;li&gt;LlamaIndex: &lt;strong&gt;54 ms&lt;/strong&gt; (14× slower)&lt;/li&gt;
&lt;li&gt;GraphBit: &lt;strong&gt;138 ms&lt;/strong&gt; (34× slower)&lt;/li&gt;
&lt;li&gt;LangGraph: &lt;strong&gt;63 ms&lt;/strong&gt; (16× slower)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For serverless deployments or auto-scaling scenarios where instances spin up on demand, a 4 ms cold start vs 60–140 ms is a qualitative difference in user experience.&lt;/p&gt;
&lt;h4&gt;
  
  
  CPU Usage
&lt;/h4&gt;

&lt;p&gt;CPU tells a more nuanced story. Rig (Rust) runs at 24.3% — the most efficient. AutoAgents runs at 29.2%. LangChain leads the Python pack at 64.0%. High CPU means less headroom for burst traffic without throttling.&lt;/p&gt;

&lt;p&gt;The throughput-per-CPU efficiency ranking mirrors the composite score.&lt;/p&gt;
&lt;h3&gt;
  
  
  How We Scored Frameworks
&lt;/h3&gt;

&lt;p&gt;The composite score uses &lt;strong&gt;min-max normalization&lt;/strong&gt; so every dimension is on a consistent 0–1 scale (best = 1, worst = 0), regardless of unit or direction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score = mmLow(latency)     × 27.8%   # lower is better
      + mmLow(memory)      × 22.2%   # lower is better
      + mmHigh(throughput) × 33.3%   # higher is better
      + mmHigh(cpu_eff)    × 16.7%   # rps/cpu%, higher is better

where mmHigh(v, min, max) = (v - min) / (max - min)
      mmLow(v,  min, max) = (max - v) / (max - min)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Weights reflect what matters at production scale: throughput is the primary capacity driver (33.3%), latency is user-facing (27.8%), memory drives infrastructure cost (22.2%), and CPU efficiency determines burst headroom (16.7%).&lt;/p&gt;




&lt;h3&gt;
  
  
  What This Benchmark Doesn't Cover
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-step agents&lt;/strong&gt;: We only benchmark single tool-call ReAct loops. Long-horizon planning with many LLM calls may change the picture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent systems&lt;/strong&gt;: Frameworks designed for agent orchestration (LangGraph, CrewAI) are arguably optimized for complexity we didn't measure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer quality&lt;/strong&gt;: Determinism rate tracks whether the output is consistent, not whether it's correct by a human rubric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming&lt;/strong&gt;: All results are blocking responses. Streaming latency profiles differ.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different models&lt;/strong&gt;: These results are specific to gpt-4o-mini. Different models with different token sizes will shift the LLM-dominated portion of latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If these gaps are important for your use case, we'd welcome contributions that extend the benchmark suite.&lt;/p&gt;




&lt;h3&gt;
  
  
  Takeaway
&lt;/h3&gt;

&lt;p&gt;If you're choosing an AI agent framework for a production system where infrastructure cost and reliability under load matter, the memory footprint of Python frameworks is a real constraint. AutoAgents and Rig both stay under 1.1 GB peak — all Python frameworks measured exceeded 4.7 GB.&lt;/p&gt;

&lt;p&gt;The throughput and latency advantages are meaningful but not dramatic for single-agent tasks. The memory advantage is 5×, and it's structural — not something you tune away with configuration.&lt;/p&gt;

&lt;p&gt;We're continuing to extend the benchmark with more task types, multi-step workflows, and streaming measurements. Issues and PRs welcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give us a star on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/liquidos-ai/AutoAgents" rel="noopener noreferrer"&gt;https://github.com/liquidos-ai/AutoAgents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>rust</category>
    </item>
    <item>
      <title>Write Agents in Rust — Run Them Locally on Android</title>
      <dc:creator>Sai Vishwak</dc:creator>
      <pubDate>Fri, 23 Jan 2026 13:49:17 +0000</pubDate>
      <link>https://forem.com/saivishwak/write-agents-in-rust-run-them-locally-on-android-4c4</link>
      <guid>https://forem.com/saivishwak/write-agents-in-rust-run-them-locally-on-android-4c4</guid>
      <description>&lt;p&gt;Imagine building AI agents that run entirely on your Android device, using local models, without sending any data to the cloud.&lt;/p&gt;

&lt;p&gt;Sounds futuristic? It’s here today.&lt;/p&gt;

&lt;p&gt;With Rust-powered &lt;a href="https://github.com/liquidos-ai/AutoAgents" rel="noopener noreferrer"&gt;AutoAgents&lt;/a&gt;, you can now:&lt;/p&gt;

&lt;p&gt;Write intelligent agents in Rust — fast, safe, and flexible&lt;/p&gt;

&lt;p&gt;Deploy them directly on Android — private, on-device, and offline&lt;/p&gt;

&lt;p&gt;Use local AI models — no cloud dependency, full control over data&lt;/p&gt;

&lt;h3&gt;
  
  
  For Developers and Startups
&lt;/h3&gt;

&lt;p&gt;If you’re building AI-native apps, or thinking of launching a privacy-first AI product, this is your playground.&lt;/p&gt;

&lt;p&gt;We’re excited to see what developers build when they can write agents in Rust.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/JeGs2usZEn4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;AutoAgents is fully open source!&lt;/p&gt;

&lt;p&gt;Check it out, try it on your projects, and give feedback.&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/liquidos-ai/AutoAgents" rel="noopener noreferrer"&gt;https://github.com/liquidos-ai/AutoAgents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Android App Example: &lt;a href="https://github.com/liquidos-ai/AutoAgents-Android-Example" rel="noopener noreferrer"&gt;https://github.com/liquidos-ai/AutoAgents-Android-Example&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy coding! 🚀&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>rust</category>
    </item>
    <item>
      <title>AutoAgents – a Rust-Based Multi-Agent Framework for LLM-Powered Intelligence</title>
      <dc:creator>Sai Vishwak</dc:creator>
      <pubDate>Tue, 14 Oct 2025 18:09:21 +0000</pubDate>
      <link>https://forem.com/saivishwak/autoagents-a-rust-based-multi-agent-framework-for-llm-powered-intelligence-27h2</link>
      <guid>https://forem.com/saivishwak/autoagents-a-rust-based-multi-agent-framework-for-llm-powered-intelligence-27h2</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/liquidos-ai/AutoAgents" rel="noopener noreferrer"&gt;AutoAgents&lt;/a&gt; is a multi-agent framework built in Rust, designed for performance, safety, and scalability. It enables the creation of intelligent, autonomous agents.&lt;br&gt;
With AutoAgents, you can build Cloud-Native Agents, Edge-Native Agents, or even Hybrid Models. The framework features a modular architecture with swappable components — memory layers, executors, and communication backends can be replaced with minimal effort.&lt;/p&gt;

&lt;p&gt;We’re actively developing AutoAgents and would love to get feedback, ideas, and collaborators from the community.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
