AI Agents Don’t Scale Like Chatbots

ModelIndex — Thu, 19 Feb 2026 13:35:40 +0000

Originally published on Medium:
https://medium.com/@ravi.myakala/ai-agents-dont-scale-like-chatbots-2434e4fbe321

Most LLM cost estimates use something like:

cost = requests * avg_tokens * price_per_token

That works for chat systems.
It breaks for AI agents.

In multi-step agent systems, cost isn’t driven primarily by request volume — it’s driven by execution depth.

Chat Workloads (Linear Scaling)

A typical chat interaction looks like:

User request
   ↓
LLM
   ↓
Response

cost ≈ requests * tokens_per_request

If traffic doubles, cost doubles.
Predictable. Linear.

Agent Workloads (Internal Multiplication)

Now compare that with a tool-using agent:

User task
   ↓
Reasoning step
   ↓
Tool call
   ↓
Reflection
   ↓
Another tool call
   ↓
More reasoning
   ↓
Final output

A single task can trigger multiple LLM invocations.
This internal expansion is the structural difference.

The Real Agent Cost Model

Instead of:

cost ≈ requests * tokens

Agent systems look more like:

cost ≈ (
    tasks
    * execution_depth
    * tokens_per_step
    * retry_multiplier
    * burst_factor
    * price_per_token
)

Where:
execution_depth = number of reasoning/tool steps per task
retry_multiplier = amplification from tool failures
burst_factor = volatility from uneven task complexity

The dominant driver becomes execution depth, not traffic.

Why Teams Underestimate Agent Cost

Common failure points:

Execution Depth Creep
Workflows evolve from 3 steps to 6–8 steps over time.
Retry Amplification
Tool failures add extra reasoning cycles.
Context Accumulation
Memory grows across steps.
Burst Volatility
Some tasks expand far deeper than others.

By the time telemetry shows cost drift, the architecture is already deployed.

A Canonical Agent Scenario

I modeled a canonical multi-step AI agent workload with:

Controlled execution depth
Tool retries
Context accumulation
Burst volatility

Full structural breakdown here:
👉 https://www.modelindex.io/scenarios/ai-agent

The goal isn’t benchmarking models — it’s understanding structural cost behavior before deployment.

Key Takeaway

Chat systems scale with traffic.
Agent systems scale with internal execution depth.
If you’re modeling cost for multi-step workflows, execution depth is the variable you should track first.