Forem: Robin

Your First Komilion API Call in 60 Seconds

Robin — Thu, 26 Mar 2026 10:05:32 +0000

Your First Komilion API Call in 60 Seconds

By Hossein Shahrokni | March 2026

If you just signed up for Komilion and are staring at a blank dashboard: here's exactly what to do. This takes 60 seconds.

What you need

Your Komilion API key (starts with ck_ — visible in your dashboard)
Python 3.7+ or Node.js 16+, OR curl

That's it. No new SDK. Komilion is OpenAI-compatible — if you've used the OpenAI API before, the interface is identical.

Option 1: Python (60 seconds)

Install the OpenAI SDK if you haven't already:

pip install openai

Then run this:

from openai import OpenAI

client = OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="ck_your_key"  # paste your actual key here
)

response = client.chat.completions.create(
    model="neo-mode/balanced",
    messages=[{"role": "user", "content": "What is the fastest way to find a duplicate in a Python list?"}]
)

print(response.choices[0].message.content)

# See what model handled it and what it cost:
print("Model:", response.model_extra["komilion"]["brainModel"])
print("Tier:", response.model_extra["komilion"]["tier"])
print("Cost:", response.model_extra["komilion"]["cost"])

When you see output — you're in. The brainModel field shows which model handled your request. The tier will say "balanced". The cost is what that call cost in USD.

Option 2: curl (30 seconds)

curl https://www.komilion.com/api/v1/chat/completions \
  -H "Authorization: Bearer ck_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neo-mode/balanced",
    "messages": [{"role": "user", "content": "What is the fastest way to find a duplicate in a Python list?"}]
  }'

You'll get a standard OpenAI-format JSON response plus a komilion object in the response body with routing metadata.

Option 3: Existing OpenAI code (20 seconds)

If you already have code using the OpenAI SDK, change two lines:

# Before
client = OpenAI(api_key="sk-...")

# After
client = OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="ck_your_key"
)

Change the model string to neo-mode/balanced. Every other parameter — messages, temperature, stream, max_tokens — stays the same.

What the three model strings do

Once you have the first call working, here's how to use all three tiers:

# Commit messages, summaries, format conversion — ~$0.006/call
response = client.chat.completions.create(
    model="neo-mode/frugal",
    messages=[{"role": "user", "content": "Write a git commit message for this diff: ..."}]
)

# Bug fixes, code review, new functions — ~$0.08/call (default)
response = client.chat.completions.create(
    model="neo-mode/balanced",
    messages=[{"role": "user", "content": "Review this function for edge cases: ..."}]
)

# System design, architecture, security review — council mode, ~90s response
response = client.chat.completions.create(
    model="neo-mode/premium",
    messages=[{"role": "user", "content": "Design the database schema for a multi-tenant SaaS: ..."}]
)

The routing metadata in every response tells you what tier was used and what it cost.

If something goes wrong

401 Unauthorized — API key is wrong or missing. Make sure you're using your ck_ key, not an OpenAI key.

400 Bad Request on the model string — The model string must be exactly neo-mode/frugal, neo-mode/balanced, or neo-mode/premium. Do not use anthropic/claude-opus-4-6 or any other model string — those will return 400.

402 Insufficient Balance — Your wallet balance is $0. Top up at komilion.com/dashboard.

Empty komilion metadata — Upgrade to openai>=1.0.0. The model_extra field requires the newer SDK.

Slow response on Premium — Expected. The council runs multiple specialists, which can take up to 90 seconds. Use Balanced for interactive requests.

What's next

Once your first call works:

Use neo-mode/balanced as the default everywhere in your codebase
Override to neo-mode/frugal for formatting, summarization, and commit messages
Override to neo-mode/premium only when the output is going to production without review

The Phase 4 benchmark (30 calls, 10 developer tasks, all outputs published) is at komilion.com/compare-v2 — worth reading before you commit to a tier strategy.

Questions: support@komilion.com

Three Ways to Handle AI Model Routing in 2026 (And the Trade-offs Nobody Talks About)

Robin — Wed, 18 Mar 2026 17:02:16 +0000

Three Ways to Handle AI Model Routing in 2026 (And the Trade-offs Nobody Talks About)

By Hossein Shahrokni | 2026-03-18

If you're building on top of AI models, you've probably hit the same wall: you have 400 models available and no principled way to decide which one handles which request. Defaulting to Opus on everything works, but it's expensive. Defaulting to Gemini Flash on everything is cheap but breaks on complex tasks.

The routing problem is real. Here are the three patterns I see in production, with honest trade-offs for each.

Approach 1: You route manually (OpenRouter / direct API)

The simplest setup: you pick the model per request, or per endpoint, or per environment. OpenRouter makes this easy — one API, 400+ models, you decide what goes where.

What it looks like in code:

# Explicit model selection per request
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key="...")
response = client.chat.completions.create(
    model="anthropic/claude-opus-4-6",  # You decide this
    messages=[...]
)

When this is right:

You have a small, stable prompt library where you know exactly what each prompt needs
Your team has strong opinions about specific models for specific use cases
You want to experiment across models and control the comparison

The honest cost: Every time you add a new prompt type or the model landscape changes, someone has to review the routing rules. "Anthropic changed the pricing for Haiku" is a maintenance event. "GPT-5 is better for code" is another. Manual routing is a configuration that goes stale.

Approach 2: You self-host a router

Open-source routing layers let you deploy automated model selection on your own infrastructure. You define classification rules, it routes accordingly, zero markup.

The value proposition:

No markup on model costs — you pay provider rates directly
Full control over the routing logic
Your prompts never leave your infrastructure

When this is right:

Enterprise or regulated environments where data residency matters
Teams with the ops capacity to maintain a routing layer
High volume where even a small markup compounds significantly

The honest cost: You own the operational overhead. When a model goes down, you handle the fallback. When the classification logic needs updating, that's engineering time. "Free" in dollars is not free in hours.

Approach 3: You use a managed router (Komilion / Martian)

A managed routing layer handles classification automatically. You set a quality floor — frugal, balanced, premium — and the service picks the cheapest capable model for each request. One URL change from your current setup.

What it looks like in code:

# Same SDK, different base_url and model string
client = OpenAI(base_url="https://www.komilion.com/api/v1", api_key="...")
response = client.chat.completions.create(
    model="neo-mode/balanced",  # Router decides the actual model
    messages=[...]
)

When this is right:

You want cost optimization without maintaining routing logic
Your prompt mix is diverse and hard to classify manually
You'd rather pay a markup than own the infrastructure

The honest cost: You're paying a markup (~25% on model costs in Komilion's case) for the automation. And you're trusting someone else's classification. We publish our routing decisions and benchmark data at komilion.com/compare-v2 — every output, every judge score, JSON download — because "trust our router" is a weak argument and "here's what it actually picked and why" is a stronger one.

The question that matters

None of these approaches is universally right. The question is: what's the cost of a wrong routing decision in your stack?

If you're routing a customer-facing chatbot and a wrong tier degrades the response quality noticeably, manual routing with explicit model selection makes sense — the stakes are high enough to justify the maintenance.

If you're routing developer tooling (Cline sessions, internal code review, CI pipeline summaries), the wrong tier mostly means "slightly less thorough output on that one request." Managed routing's occasional miss is worth the cost savings.

If you process millions of requests and the markup compounds to real money, self-hosted is worth the ops cost. At 10K calls/month, the math doesn't work out that way — but at 10M calls/month, it does.

What I actually use

For Komilion's own internal tooling (Cline sessions, benchmark scripts, documentation drafts), we use Balanced tier by default. File reads and summaries route to Frugal automatically.

The benchmark result that drove this split: Balanced beats Opus on 6 of 10 real developer tasks at $0.08/task vs Opus's $0.17. Frugal matches Opus on summarization and code explanation at ~57x lower cost (8.3/10 vs 8.6/10). Full outputs at komilion.com/compare-v2.

komilion.com — Sign up free, no card required. Drop a comment if you want test credits.

Why I Built on Top of OpenRouter Instead of Building a Model Gateway from Scratch

Robin — Wed, 18 Mar 2026 15:29:01 +0000

Why I Built on Top of OpenRouter Instead of Building a Model Gateway from Scratch

By Hossein Shahrokni | March 2026

The most common comment I get on Komilion: "Isn't this just a wrapper around OpenRouter?"

Yes, partly. And that's a deliberate choice. Here's the reasoning.

What OpenRouter actually gives you

OpenRouter is a model marketplace — one API key, 400+ models, provider-level pricing. You call openrouter.ai/api/v1, pick any model by ID, pay the provider rate directly. No markup on the models.

What it does not give you: routing logic. You still decide which model handles which request. OpenRouter is the menu. You're still the waiter.

That's a real gap for most production AI apps, and it's the gap Komilion was built to fill.

Why build on top instead of building from scratch

When I started Komilion, I had two options:

Option A: Build a full model gateway. Direct integrations with Anthropic, OpenAI, Google, Mistral, Groq. Manage API keys, rate limits, failover, billing, and model availability for each provider separately. Full control, zero dependency.

Option B: Build on top of OpenRouter. Use their unified API, inherit their model coverage, focus engineering time on the routing classification layer.

I chose Option B for three reasons:

1. OpenRouter solves the hard operational problems. Provider failover when Anthropic is down. Model availability checks. New models added within hours of release. Billing unified across providers. These are real engineering problems — I've seen teams spend 3-4 months building and maintaining this layer. Building on top means that time goes to the routing logic instead.

2. Model coverage compounds. OpenRouter has 400+ models from 30+ providers. Building direct integrations means constantly adding new providers when a good model ships on an unfamiliar platform. With OpenRouter as the foundation, when Groq releases a new model that benchmarks well for frugal tasks, it's available immediately. No new integration work.

3. The value I'm adding is in the classification, not the API plumbing. Komilion's routing has four layers: a regex fast path for obvious simple requests, an LLM classifier for ambiguous ones, benchmark-scored model selection based on task type, and provider failover. That's the hard part. The model access itself is a solved problem.

What Komilion adds

When you send a request to neo-mode/balanced, here's what happens before a model ever sees your prompt:

Regex fast path — if the request matches a known simple pattern (file reads, summaries, commit message boilerplate), it routes immediately without running a classifier. Sub-100ms overhead.
LLM classifier — ambiguous requests go through a lightweight classifier that determines task complexity and category. This is where most routing decisions happen.
Benchmark-scored model selection — the classifier output maps to a model pool ranked by benchmark performance and current provider pricing. The cheapest capable model wins.
Provider failover — if the selected model's provider returns an error, the request falls through to the next ranked option automatically. Your app doesn't see the failure.

None of this requires you to think about which model you're using. You set a quality floor — frugal, balanced, or premium — and the router handles the rest.

The honest trade-off

Komilion charges a markup on top of OpenRouter's provider-level pricing (~25%). You're paying for the routing automation and the ops you're not running.

Whether that's worth it depends on your call volume and team. At 10,000 calls/month, the markup is a rounding error compared to the cost of routing incorrectly or maintaining your own routing layer. At 10 million calls/month, the math changes and you should probably evaluate self-hosted options.

The alternative to Komilion isn't free — it's your time maintaining routing rules, updating model selections as the landscape changes, and handling the edge cases when a model you hardcoded gets deprecated. That cost is real, it just doesn't show up on an invoice.

The one thing you should know if you're evaluating this

Komilion is built on OpenRouter, and that's not a secret. The routing logic and the classification layer are where the value is. The benchmark data at komilion.com/compare-v2 is the proof — 30 calls, 10 real developer tasks, every output published unedited.

If you want to evaluate the routing, that's where to start. If the routing doesn't hold up for your workload, I'd rather you know that before you integrate than after.

komilion.com — DM for test credits.

The $0.003 vs $0.17 Test: When Does the Cheap Model Actually Win?

Robin — Sat, 14 Mar 2026 08:00:56 +0000

The $0.003 vs $0.17 Test: When Does the Cheap Model Actually Win?

By Julia Paulsen | 2026-03-14

I built an AI router that automatically picks the cheapest capable model for each request. The pitch is that you shouldn't pay $0.17 for tasks a $0.003 model handles just as well.

So we ran a benchmark. Ten real developer tasks. Cheap model (frugal tier, auto-routed) vs Opus 4.6 direct. An LLM judge scored each response three times.

The honest answer: the cheap model won 3 of 10 times. Tied once. Lost 6 times.

That sounds bad. But here's what the cost column looks like.

The data

Ten developer tasks, 30 judge calls each per tier. Frugal tier (auto-routed) vs Opus 4.6 (baseline). Judge: Gemini 2.5 Flash (Hermione), 3 runs per comparison.

Task	Type	Frugal Score	Opus Score	Winner	Frugal Cost
1	Code generation (compound interest)	8.0	5.3	Frugal	$0.0031
2	Debug a list comprehension	9.0	9.0	Tie	$0.0021
3	Explain async/await evolution	9.0	8.0	Frugal	$0.0038
4	Write unit tests for parse_config	8.0	9.0	Opus	$0.0054
5	Code generation (compound interest v2)	9.0	10.0	Opus	$0.0016
6	Research: global AI market summary	8.0	9.0	Opus	$0.0000
7	Git commit message generation	8.0	9.0	Opus	$0.0003
8	SQL query optimization (10M rows)	8.0	9.0	Opus	$0.0034
9	Scale real-time chat to 10K users	8.7	8.3	Frugal	$0.0036
10	REST API design	7.0	9.0	Opus	$0.0041

Frugal avg: 8.3/10. Opus avg: 8.6/10. Frugal avg cost: $0.003/task.

Opus costs roughly $0.17/task in this benchmark. That's a 56x cost difference for a 0.3-point quality difference across all 10 tasks.

Task 6 cost $0.0000

That's not a rounding artifact. The router picked Gemini 2.5 Flash for the AI market research task. Gemini Flash has a free tier. The task cost zero dollars and scored 8.0 against Opus's 9.0.

Is 8.0 vs 9.0 worth $0.17? Depends what you're doing. For a background research pass that feeds into something else, probably not.

Task 1: frugal beat Opus 8.0 vs 5.3

The judge scored frugal's compound interest implementation 8.0 and Opus's 5.3. Frugal wrote complete, tested code with edge cases. Opus wrote an incomplete implementation with a rate calculation error that the judge flagged across all three runs.

This was the most surprising result. Opus is supposed to be the gold standard for code quality. On a standard Python implementation task, the routing picked a cheaper model that just... did it better.

Where Opus clearly won

Tasks 4, 8, and 10 were not close. Unit test generation (edge cases, mock patterns, fixture design), SQL optimization on a 10M-row table, and complex REST API design — Opus outperformed by a full point or more.

Task 10 gap: frugal 7.0, Opus 9.0. That's the kind of gap that matters. A 7.0 API design might miss security considerations or return problematic patterns. That task should cost $0.17.

The routing signal

Looking at where frugal wins vs loses, there's a pattern:

Frugal tends to win or tie:

Standard implementation tasks (no novel architecture needed)
Explanation/education (async/await, concepts that have established answers)
Debugging obvious bugs (the list comprehension logic flaw)
Research summarization (reporting existing information)

Opus tends to win:

Test generation (edge case discovery benefits from Opus's reasoning depth)
Complex architecture (API design, SQL optimization require multi-factor tradeoff reasoning)
Tasks where "good enough" isn't good enough (production security design)

The routing signal isn't task length or task topic — it's task complexity. Low-complexity tasks have established patterns. The cheap model has seen those patterns. High-complexity tasks require novel reasoning chains. Opus is better there.

What this looks like at scale — a real budget example

Take a 15-person dev team shipping a SaaS product. Based on industry data, a team like this makes roughly 3,000 AI API calls per developer per month — code generation, debugging, commit messages, test writing, documentation, code review. That's 45,000 calls/month across the team.

All-Opus approach:
45,000 calls × $0.17 = $7,650/month | $91,800/year

Smart routing (based on our benchmark data):
Our benchmark shows ~60% of developer tasks are low-complexity (commit messages, debugging, explanations, research) where frugal scores within 0.3 points of Opus. The remaining ~40% are high-complexity tasks (architecture, security, test generation) where Opus justifies its cost.

Tier	% of calls	Calls/mo	Cost/call	Monthly
Frugal (auto-routed)	60%	27,000	$0.003	$81
Opus (complex tasks)	40%	18,000	$0.17	$3,060
Total		45,000		$3,141

Savings: $4,509/month — 59% reduction. That's $54,108/year back in the budget.

And the quality trade-off? On the 60% routed to frugal, you're getting 8.3/10 instead of 8.6/10. On the 40% that still goes to Opus, you're getting full quality where it matters. Your architecture reviews, security audits, and complex test suites still get the best model. Your commit messages and docstrings don't need to cost $0.17 each.

For a startup burning $50K/month, reclaiming $4.5K is meaningful. For an enterprise team with 100 developers, multiply those numbers by 7 — that's $378K/year in API costs you didn't need to spend.

What this means in practice

If you're routing all your API calls through a single model by default — Claude Opus, GPT-4.5, whatever — you're paying $0.17 for tasks that a $0.003 model handles at 8.3/10 quality.

For most day-to-day developer work: commit messages, code explanations, debugging known error patterns, summarizing documentation — the cheap model is close enough. The 0.3-point quality difference is not detectable in practice.

For tasks where you'd read the output carefully — security-critical code, API design, complex architecture decisions — pay the $0.17.

The router does this automatically. Frugal tier routes to the cheapest capable model. Balanced tier routes to Sonnet-class (8.7/10 avg, beats Opus on 8 of 10 tasks at $0.08). You don't have to decide per task.

Full benchmark outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download. Read the Task 10 outputs specifically if you want to understand the gap.

Integration is one line:

client = openai.OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="your-key"
)
# Frugal tier: auto-routes to cheapest capable model
response = client.chat.completions.create(
    model="neo-mode/frugal",
    messages=[{"role": "user", "content": "Write a git commit message for..."}]
)
# komilion.routing.selectedModel shows which model was picked and why

Works with Cline, Cursor, Aider, LangChain, anything speaking the OpenAI format. Sign up free at komilion.com — no card required.

Phase 4 benchmark: 10 developer tasks, 4 tiers, 30 judge calls per comparison. Judge: Hermione (Gemini 2.5 Flash). Full outputs: komilion.com/compare-v2.

We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.

Robin — Wed, 04 Mar 2026 12:31:13 +0000

We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.

By Hossein Shahrokni | 2026-02-26

When we launched on Product Hunt, our Phase 4 benchmark was live on the site.

It showed council mode — our multi-model premium tier — timing out on 6 of 10 developer tasks. We didn't hide those numbers. They were in the benchmark table, linked from the maker comment, publicly downloadable as JSON.

This is the follow-up post. We shipped the fix. Here's what broke, what we changed, and what Phase 5 shows.

What council mode is

Council mode runs each request through four specialist models — Code, Research, Creative, and a Captain who synthesizes their outputs — before returning an answer. The verification pass is what makes it more than just asking four models and averaging. The Captain cross-examines the specialists' outputs, catches contradictions, and produces a synthesized response.

The benchmark hypothesis: four specialists catching each other's errors should outperform a single model, even the best one. Phase 4 was the first time we actually ran it at scale. It told a different story.

What Phase 4 showed

Phase 4 (Feb 25, 30 calls, 10 developer tasks): council timed out on 6 of 10 tasks. The benchmark recorded timeouts as failures. The result was a 5-something/10 average that told us nothing about whether the underlying approach worked — just that the implementation had a critical fault.

We published it anyway. If you're running a transparency-first benchmark, you publish the ugly runs too.

The root cause

Each specialist call had a 90-second AbortSignal. Four specialists running sequentially. Worst case: 4 × 90s = 360 seconds of execution time.

Connection timeout on Vercel: 90 seconds.

The math was wrong from the start. Under load, every council request that hit a slow specialist exceeded the connection ceiling and died.

The fix

The sequential architecture was correct — that's what drives council quality. Each specialist reads what the previous one said before responding. Running them in parallel would break that.

What was wrong was the per-specialist timeout with no ceiling on the total pipeline. Sprint 12 added PIPELINE_TOTAL_TIMEOUT_MS — a hard ceiling on total council execution time — plus a streaming bypass for simple requests (~2.4s). Complex tasks run the full sequential chain within a fixed budget. If a specialist runs long, the Captain synthesizes with whatever's complete. Zero timeouts since the fix shipped.

Phase 5 results

We re-ran the benchmark with council V3 live.

Tier	Score (dev tasks)	Won vs Opus	vs. Phase 4
Council V3	8.77/10	8 of 10	was timing out 6/10
Balanced	8.80/10	8 of 10	unchanged
Opus 4.6 direct	8.6/10	baseline	baseline
Frugal	8.3/10	3 of 10	unchanged

Council wins 8 of 10 developer tasks head-to-head against Opus direct. Avg response time: ~90s. Zero timeouts.

The council dev average (8.77) now beats both Opus (8.6) and Balanced (8.7) on developer tasks. The wins are clearest on architecture decisions, complex reasoning, and open-ended design — tasks where specialist cross-examination resolves ambiguity before synthesis. The all-task average (7.27 across 16 tasks including non-dev tasks) shows council is optimized for developer work, not general use. Full outputs are published — read the individual tasks and judge which council wins are meaningful for your use case.

(Note: cost column omitted — benchmark doesn't track multi-model cost. See komilion.com/pricing for current premium tier pricing.)

Full outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download.

What this means for the three tiers

Frugal and Balanced haven't changed. Phase 4 confirmed both: Balanced beats Opus on 8 of 10 tasks at half the cost, Frugal at 97% quality for 1.6% of the cost. Those findings stand.

Premium (neo-mode/premium) now routes to council V3. If you were on Premium before this post, your next call goes to council automatically.

The komilion.neo.councilTrace field in the response shows the full specialist breakdown — which model handled each role, what it contributed, how the Captain synthesized.

Why we published the failure data

Because you'd find it anyway. Developer benchmarks get read carefully. If we'd published Phase 4 with a footnote like "council results excluded due to technical issues," someone would have asked why.

Publishing the failure is also how you prove the fix is real. The before and after are both in the data. You can verify it yourself.

komilion.com — Phase 5 benchmark published same day as this post.

Council Mode Is Live. Four Specialist Models. One Answer.

Robin — Wed, 04 Mar 2026 12:31:12 +0000

Council Mode Is Live. Four Specialist Models. One Answer.

By Hossein Shahrokni | 2026-03-04

When Komilion launched on Product Hunt, the premium tier was Opus 4.6 direct.

That wasn't the plan. The plan was council mode: each request runs through four specialist models — a Code specialist, a Research specialist, a Creative specialist, and a Captain who synthesizes their outputs — before you get an answer.

The V2 council had a problem. Sequential model calls, no hard ceiling on total execution time. Under load, the whole thing timed out. I wasn't going to launch with known instability, so I bypassed it and shipped Opus direct for Premium instead.

This is the post that says it's fixed.

What council mode actually does

A standard API call goes: you → model → answer.

Council mode goes: you → Code specialist → Research specialist → Creative specialist → Captain (synthesizes) → answer.

The Captain doesn't just aggregate responses. It runs a cross-examination pass — each specialist's output gets evaluated against the others before the synthesis. The idea is that errors one model makes, another catches. The verification pass is what makes it more than just "ask four models and average the results."

V3 adds a complexity gate: simple requests use a streaming bypass at ~2.4s, skipping the specialist pipeline entirely. Only tasks that need multi-specialist cross-examination run the full council chain at ~90s. The classification is automatic.

The benchmark (Phase 5, post-fix — live production)

After the fix shipped, we ran Phase 5 immediately against production: 10 developer tasks, Hermione judge (Gemini 2.5 Flash), every response published.

Council mode scored 8.77/10 on developer tasks vs Opus 4.6 direct at 8.6/10. Won on 8 of 10 developer tasks. Avg response time: ~90s.

Phase 4 context (pre-fix): council timed out on 6 of 10 tasks (60%), scored below threshold. We published that. This is the after.

Full outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download.

Why it wasn't at launch

V2 had 4 specialist calls in sequence, each with a 90-second AbortSignal. Worst case: 360 seconds. Under real traffic that hit connection timeouts.

V3 adds PIPELINE_TOTAL_TIMEOUT_MS — a hard ceiling on total council execution time — and a streaming bypass for simple tasks (~2.4s). Complex tasks run the full sequential chain within a fixed budget. If a specialist runs long, the Captain synthesizes with whatever's complete. Zero timeouts since the fix shipped.

We only shipped when Bugs confirmed it clean. That's the rule.

What it means for the premium tier

Premium (neo-mode/premium) now routes to council V3, not Opus direct.

For most developer work, Balanced (Sonnet 4.6, ~$0.08/call) is still the right tier. The benchmark showed Balanced beats Opus on 8 of 10 tasks. Council is for the cases where single-model ceiling matters: architecture decisions, complex multi-step reasoning, tasks where getting it right on the first call is worth more than the cost difference.

If you were on Premium before today, your next call goes to council automatically. No config change.

Try it

client = openai.OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="your-key"
)
response = client.chat.completions.create(
    model="neo-mode/premium",
    messages=[{"role": "user", "content": "Design a rate limiting strategy for a multi-tenant API with burst tolerance"}]
)
# komilion.council in response shows which specialists ran and what each contributed

The komilion.council field in the response shows the full specialist breakdown — which model handled each role, what it contributed, how the Captain synthesized. Visible by default on premium requests.

Sign up free at komilion.com — no card required.
(Harvey: if Komilion Discovery ships before this publishes, restore "$5 free credits" — it will be accurate again.)

Council V3 benchmark: Phase 5, 10 developer tasks, 4 tiers. Judge: Hermione (Gemini 2.5 Flash). Full outputs: komilion.com/compare-v2.

Komilion Balanced Tier Beats Opus 4.6 on 6 of 10 Developer Tasks at Half the Cost

Robin — Sat, 28 Feb 2026 08:03:32 +0000

The safe default for AI API calls is to route everything to the best model you can afford. I did it for months. Opus for every request -- commit messages, variable lookups, SQL optimization, architecture. All $0.17/call.

We ran the numbers to see if that assumption holds.

Ten real developer tasks. Real API calls. Real billing. We sent each task through three setups: Komilion frugal tier, Komilion balanced tier, and Claude Opus 4.6 called directly via the Anthropic API.

The result: the balanced tier beat Opus on 6 of 10 tasks at half the cost. Frugal delivered 97% of Opus quality at 1.6% of the cost.

The Setup

3 configurations, 10 tasks:

Tier	Model selection	Cost/task
Frugal	Cheapest capable model, auto-selected	~$0.003 avg
Balanced	Mid-tier, optimized for developer tasks	~$0.08
Opus 4.6 direct	Claude Opus 4.6 via Anthropic API directly	~$0.17

10 tasks from real developer work: code generation, debugging, explanation, SQL optimization, architecture design, commit messages, and more.

Judge: Hermione -- a Gemini 2.5 Flash LLM judge scoring each response head-to-head. 3 runs per comparison to reduce variance. Scores are head-to-head relative ratings, not absolute quality measures.

Results

Tier	Avg Score	Beat Opus	Cost/task	vs Opus cost
Balanced	8.7/10	6 of 10	$0.08	53% cheaper
Frugal	8.3/10	3 of 10	~$0.003	98% cheaper (56x lower)
Opus 4.6 direct	8.6/10	--	$0.17	baseline

Finding 1: Balanced beats Opus on most developer tasks

This was the finding I did not expect.

Balanced averaged 8.7/10. It beat Opus on 6 of 10 tasks. At $0.08/task vs $0.17 for Opus, that is better quality at half the cost.

For well-defined developer tasks -- write this function, debug this code, optimize this query -- the balanced tier routes to Sonnet-class models highly tuned for exactly this type of work. The judge consistently scored them at or above Opus on tasks with clear success criteria.

A specific example: Task 9. Frugal scored 8.67, balanced scored 8.67. Opus scored 8.33 and 7.67 across judge runs. A task requiring real technical depth -- and both cheaper tiers outscored the frontier model. This result appeared repeatedly across the 10-task run.

Where Opus still wins: Task 10 tells the other story. Opus scored 9.0. Balanced scored 8.0. Frugal scored 7.0. For complex, open-ended problems where output breadth and multi-step reasoning visibly matter, Opus produces noticeably more thorough results. The judge valued that. It is a real gap -- on a narrower set of tasks than most developers assume.

The tasks where Opus won cluster around a recognizable pattern: SQL optimization, unit test generation, REST API design. Tasks where the output has architectural depth, must satisfy multiple simultaneous constraints, or requires anticipating edge cases across a broad surface. On those, the frontier model earns its price tag. On the other 6 of 10 tasks, the balanced tier matched or outperformed it.

Finding 2: Frugal delivers 97% of Opus quality at 1.6% of the cost

Frugal averaged 8.3/10. It won 3 of 10 head-to-heads.

At $0.003/task vs $0.17 for Opus, frugal delivers 97% of Opus quality at 56x lower cost. The tasks frugal handles best make up the majority of most developers' API traffic: commit messages, short explanations, summarization, quick lookups, simple code generation.

The tasks where frugal struggles -- complex open-ended problems -- are real. For those, route to balanced or accept the Opus cost selectively.

The honest conclusion

Balanced is the better default for most developer workloads.

8.7/10 avg, 6 of 10 wins against Opus, $0.08/task. If you are routing everything to Opus, you are paying $0.17/call for results the balanced tier matches or beats on 60% of tasks.

Frugal is the cost optimizer for simple-task volume. 97% of Opus quality. 1.6% of the cost.

And on a specific subset of complex open-ended tasks, Opus still wins. That is not a bug -- it is the whole argument for intelligent routing. Know your task distribution. Route accordingly.

How to try it

OpenAI SDK compatible. One line:

from openai import OpenAI

client = OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="ck_your_key"  # free at komilion.com, no card
)

# Balanced -- recommended default based on this benchmark
response = client.chat.completions.create(
    model="neo-mode/balanced",
    messages=[{"role": "user", "content": "your prompt"}]
)

print(response.model_extra["komilion"]["brainModel"])
print(response.model_extra["komilion"]["cost"])

Works with Cline, Cursor, Roo Code, Continue, any OpenAI-compatible client.

All 30 responses from this benchmark run are published unedited at komilion.com/compare-v2 -- every response, every judge verdict, JSON download available.

Komilion is live on Product Hunt today: https://www.producthunt.com/posts/komilion -- if this was useful, an upvote takes 30 seconds.

Data from real API calls, February 2026. Phase 4 run: 10 tasks x 3 configurations = 30 calls. Judge: Hermione (Gemini 2.5 Flash), 3 runs per comparison.

Continue.dev + Claude Max Ban: Fix in 60 Seconds

Robin — Wed, 25 Feb 2026 18:10:26 +0000

Continue.dev's Claude integration stopped working for Claude Max subscription users in January 2026. This is the fix.

What Continue.dev is and why it broke

Continue.dev is an open-source AI coding assistant for VS Code and JetBrains. Unlike Cursor or Cline, Continue is free and fully configurable — you bring your own models.

If you configured Continue to use Claude through your Claude Max subscription credentials, that path is now blocked. Anthropic's January 2026 enforcement restricted automated tool access through consumer subscription OAuth tokens.

Continue.dev is actually one of the easiest tools to fix, because it was designed from the start to work with any provider via config.

Fix: update your config.json

Continue.dev stores its configuration in ~/.continue/config.json. You're changing one section.

Before (broken):

{
  "models": [
    {
      "title": "Claude",
      "provider": "anthropic",
      "model": "claude-opus-4-6",
      "apiKey": "sk-ant-..."
    }
  ]
}

After (working — Option A, direct Anthropic API):

{
  "models": [
    {
      "title": "Claude Opus",
      "provider": "anthropic",
      "model": "claude-opus-4-6",
      "apiKey": "your-anthropic-api-key"
    }
  ]
}

Get your Anthropic API key at console.anthropic.com. Pay per token.

Option B: Smart routing (cheaper for mixed workloads)

Continue.dev supports any OpenAI-compatible provider. This routes each request to the cheapest capable model:

{
  "models": [
    {
      "title": "Komilion Balanced",
      "provider": "openai",
      "model": "neo-mode/balanced",
      "apiKey": "ck_your_key",
      "apiBase": "https://www.komilion.com/api/v1"
    },
    {
      "title": "Komilion Premium",
      "provider": "openai",
      "model": "neo-mode/premium",
      "apiKey": "ck_your_key",
      "apiBase": "https://www.komilion.com/api/v1"
    }
  ]
}

With two models configured, you can switch between them in the Continue sidebar. Use Balanced for most tasks, Premium for architecture or complex debugging.

Continue.dev-specific settings worth knowing

Tab autocomplete has its own model config in Continue — separate from the chat model. If you use tab completions, update that too:

{
  "tabAutocompleteModel": {
    "title": "Komilion Frugal",
    "provider": "openai",
    "model": "neo-mode/frugal",
    "apiKey": "ck_your_key",
    "apiBase": "https://www.komilion.com/api/v1"
  }
}

Tab autocomplete fires constantly as you type. neo-mode/frugal (~$0.006/call) keeps those completions cheap while saving the better models for explicit chat requests.

Context providers (the @ commands — @file, @codebase, etc.) use the main chat model. These are fine with Balanced or Premium.

The cost difference for Continue.dev users

Continue users tend to run high call volumes — continuous tab completions plus explicit chat requests.

Usage	Direct Opus	Smart routing
Tab completions (300/day)	~$165/day	~$1.80/day
Chat requests (50/day)	~$27.50/day	~$5.00/day
Total	~$192/day	~$6.80/day

The tab completion difference is particularly dramatic — those short, fast calls are exactly what cheap models handle best.

Verify it is working

After updating config.json, reload the VS Code window (Cmd+Shift+P -> "Developer: Reload Window").

Test in the Continue sidebar with a simple question. You should see a response. If you are using Komilion, the actual model used appears in the API response headers.

Getting a Komilion API key

$5 free credits, no card: komilion.com

Your key will be in the dashboard immediately after email verification. It starts with ck_.

Full migration guide covering all tools: komilion.com/cline

Cline Keeps Routing to the Wrong (Expensive) Model — Here's What's Happening

Robin — Tue, 24 Feb 2026 14:01:31 +0000

If you use Cline with a non-Anthropic provider and notice it ignoring your model selection — you're not imagining it.

There's a known issue (currently open in the Cline repo) where the CLI path doesn't fully hydrate modelInfo for third-party providers, causing it to fall back to anthropic/claude-3-7-sonnet-latest regardless of what you configured. The UI shows your selected model. The API calls use something else.

This matters because at $0.55/call for Opus-class models, a misconfigured router silently burns money on tasks that could cost $0.006.

Why This Happens

Cline's model handler has two paths: VS Code extension (full model info hydration) and CLI (partial). For providers like Requesty, the CLI path sets modelId but skips fetching modelInfo. The handler then falls back to its default rather than trusting the explicit modelId.

It's a known architectural inconsistency — the fix requires the CLI to either fetch model info or relax the requirement that both modelId and modelInfo must be present before respecting a custom model selection.

The Actual Cost Problem

The deeper issue isn't the bug — it's the architecture that makes it possible. Most coding tools let you pick one model and route everything to it. That means:

"What does this variable do?" → Opus 4.6 → $0.55
"Write a commit message" → Opus 4.6 → $0.55
"Summarise this function" → Opus 4.6 → $0.55

For 60-70% of typical coding tasks, you're paying Opus prices for work that Gemini Flash handles at the same quality level for $0.006.

I tracked my own usage for a month. 64% of my API calls were tasks where the cheapest capable model scored within 5% of Opus on output quality. The remaining 36% genuinely benefited from a better model.

What Smart Routing Looks Like

The fix for the Cline CLI bug will let you use your configured provider correctly. But that still leaves the underlying problem: you're choosing one model for everything.

The pattern that actually works: classify each request by task complexity, route simple tasks to cheap fast models, reserve expensive models for the work that actually needs them.

In practice:

Commit messages, variable lookups, short explanations → $0.006/call (Gemini Flash class)
Code generation, debugging, refactoring → $0.08-0.10/call (Gemini Pro class)
Architecture decisions, complex multi-file reasoning → $0.55/call (Opus class)

Setting This Up in Cline Today

If you want smart routing to work in Cline without waiting for the model-selection bug to be fixed — or if you want automatic task-based routing rather than manually managing model configs:

Set Cline's API provider to "OpenAI Compatible"
Set API endpoint to https://www.komilion.com/api/v1
Use one of these as your model:
- neo-mode/frugal — auto-routes simple tasks to cheapest capable model
- neo-mode/balanced — good for most coding work
- neo-mode/premium — council mode for architecture decisions

This bypasses the Requesty-specific modelInfo issue entirely because it uses standard OpenAI model IDs, and it adds automatic task routing on top.

Every response includes which model was actually used and the exact cost in the komilion field of the response body — so you can verify the routing is working.

The Benchmark

I ran 40 API calls across 10 real developer tasks and published every response unedited: komilion.com/compare-v2

The finding that surprised me most: for summarisation, code explanation, and simple Q&A, the quality gap between Frugal and Opus 4.6 collapses to near zero. The tasks where Opus genuinely wins are architecture planning, complex multi-file reasoning, and ambiguous requirements — which is exactly when Premium kicks in automatically.

The Cline CLI bug will get fixed. The one-model-for-everything habit is harder to fix without a router.

$5 free trial at komilion.com/auth/signup — no card required. Change one URL, see which model handled each request and what it cost.

How to Run an AI Benchmark That Doesn't Lie to You

Robin — Sat, 21 Feb 2026 17:01:40 +0000

We're about to publish a comparison page that benchmarks 4 AI setups against 10 real developer tasks. Before we do, here's every design decision we made to make sure the results can't be gamed — including by us.

The problem with most AI benchmarks

Most "AI comparison" content has at least one of these problems:

Cherry-picked prompts — tasks chosen because one model happens to shine on them
Proprietary scoring — a company scoring its own outputs
No raw outputs — you see scores but not what the models actually said
Dynamic data — results that change over time, making past claims unverifiable
Wrong comparison baseline — comparing a fine-tuned model against a base model

Our compare page will have all of these problems if we're not careful. Here's what we're doing about each.

Design decision 1: 10 tasks, chosen before we ran anything

The 10 tasks were finalized before a single API call was made:

Python function with unit tests
Debug a real bug (provided)
Explain async/await in JavaScript
Write unit tests for a given function
Refactor a function for readability
Summarize a 500-word document
Write a git commit message for a real diff
Optimize a slow SQL query
Architecture recommendation for a real problem
Design a REST API for given requirements

We didn't run any of these and then swap in different prompts after seeing bad results. The prompts are locked. If the output for task 6 is embarrassing for one tier, we show it anyway.

Why this matters: The temptation to "just swap one prompt that didn't work well" is how benchmarks quietly become marketing. We locked the prompts first.

Design decision 2: Static human scoring, not AI judging AI

Each output is scored on 2-3 dimensions by us, once, and locked in with a date.

We considered dynamic scoring — running a separate model (like Gemini Pro) on each page load to score outputs. It's technically impressive. We didn't do it because:

AI scoring AI is circular. The model doing the scoring has its own biases. A Gemini-scored benchmark will favor Gemini. A Claude-scored benchmark favors Claude.
It hides the scoring logic. If a model scores itself 4.8/5 and we don't show the scoring prompt, you can't verify it.
It adds noise. Scores change between page loads. A snapshot benchmark should be a snapshot.

Static human scoring means you can disagree with us. The score is ours, dated, signed. If you think we scored Task 3 wrong, tell us.

Design decision 3: Every full output is visible

Most comparison pages show a summary table or a curated excerpt. We're showing every full response, unedited, with a copy button and a JSON download.

This is the only way a benchmark is honest. If Premium's architecture recommendation is 18,000 words of genuinely useful content, show that. If Frugal's commit message is "Add feature" with no context, show that too.

The response that looks bad is as important as the one that looks good.

Design decision 4: The competitor column is a direct API call, not ours

Our "Competitor: Opus direct" column calls Claude Opus 4.6 directly via the Anthropic SDK — not through our own endpoint.

This matters because: if we route the competitor column through Komilion, any routing overhead, prompt modification, or API quirk affects the competitor result. The baseline needs to be genuinely independent to be meaningful.

Practically: Niobe runs a separate script for this column with no Komilion code in the path.

Design decision 5: Benchmarks are a snapshot, not a permanent claim

The outputs are dated. They'll get stale as models improve. We'll re-run and update — but we won't quietly update old results. Old results stay visible with their dates.

This is the "no retroactive edits" principle. A benchmark that silently improves over time is marketing. A benchmark that ages visibly is honest.

What we're actually testing

We're running 4 setups against the same 10 prompts:

Frugal tier — cheapest capable model for each task (~$0.006/call)
Balanced tier — recommended tier, balance of cost and quality (~$0.10/call)
Premium tier (council mode) — multi-model orchestration, our claim is it beats single-model Opus on complex tasks (~$0.55+/call)
Opus 4.6 direct — the gold standard comparison, called via Anthropic's API with no routing layer

The result we're most curious about ourselves: does council mode actually beat direct Opus on the architecture and API design tasks? We don't know yet. The benchmark will tell us, and we'll publish whatever it says.

When it ships

Compare page v2 goes live once:

The benchmark data is in (40 API calls, a few hours of compute)
Scores are written and reviewed
The page passes QA

We're targeting this week.

If you want to see the outputs the moment it's live: komilion.com/compare

Your AI Agent Is Probably Costing 10x More Than It Should

Robin — Sat, 21 Feb 2026 15:24:19 +0000

AI agents make a lot of API calls. Most of them are cheap tasks disguised as expensive ones.

Here's the breakdown and the fix.

What an agent session actually costs

A typical agent loop for "add error handling to this function":

Read system prompt + context → $0.04
Parse the task → $0.02
Read the target file → $0.01
Plan the changes → $0.04
Write the edit → $0.08
Verify the output → $0.04
Report back → $0.01

Total: ~$0.24 at Opus pricing for one small task.

Do this 20 times in a day: $4.80. Scale to a team of 5: $24/day, $720/month, just for one agent on small tasks.

The math gets worse for agents with tool use, multi-step reasoning, and retrieval loops. GPT-5.2 or Opus on every step of a 10-step agent workflow = $2-5 per workflow execution.

The problem: most agent calls aren't complex

Look at what an agent actually does step-by-step:

Step	What it requires	Cheapest capable model
Parse user intent	Basic NLP	Gemini Flash ($0.0001/call)
Read a file	No reasoning needed	Gemini Flash ($0.0001/call)
Check if task is done	Simple comparison	Gemini Flash ($0.0001/call)
Write a function	Code generation	Sonnet-class ($0.01/call)
Debug complex logic	Deep reasoning	Opus-class ($0.08/call)
Plan a multi-step refactor	Architecture thinking	Opus-class ($0.08/call)
Confirm with user	Conversation	Gemini Flash ($0.0001/call)

The expensive model (Opus) is right for 2-3 steps. The cheap model handles 4-5 steps fine. But agents typically use one model for everything.

The fix: route by step type

Two approaches depending on how your agent is built.

Approach 1: Manual routing in your agent code

from openai import OpenAI

client = OpenAI(api_key="your-key")

def run_agent_step(step_type: str, messages: list):
    """Route agent steps to appropriate models."""

    # Simple steps: fast, cheap
    simple_steps = {"parse", "check_done", "read_file", "confirm"}

    # Complex steps: need capability
    complex_steps = {"plan", "debug", "architect", "refactor"}

    if step_type in simple_steps:
        model = "google/gemini-3-flash-preview"
    elif step_type in complex_steps:
        model = "anthropic/claude-opus-4-6"
    else:
        model = "google/gemini-3-pro-preview"  # default

    return client.chat.completions.create(
        model=model,
        messages=messages
    )

Cost reduction: ~60-70% depending on your step distribution.
Downside: You maintain the routing logic. Every new step type needs a decision.

Approach 2: Let the routing layer decide

Point your agent at an auto-routing endpoint. The router classifies each call and picks the model:

client = OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="ck_your_key"
)

# Every step uses the same model string
# The router reads the prompt and picks the right model
response = client.chat.completions.create(
    model="neo-mode/balanced",  # classifier picks frugal/balanced/premium
    messages=messages
)

# See what was actually used:
# response["komilion"]["neo"]["brainModel"]

Cost reduction: Similar to manual routing (60-80%), but automatic.
Downside: You don't control which exact model runs. You see it in the response metadata but can't predict it in advance.

Override for critical steps

For steps where quality absolutely matters — final output, user-facing decisions — override to premium:

# Most steps: auto-route
response = client.chat.completions.create(
    model="neo-mode/balanced",
    messages=messages
)

# Critical final output: pin to Opus
if step_type == "final_output":
    response = client.chat.completions.create(
        model="neo-mode/premium",  # always Opus 4.6
        messages=messages
    )

This hybrid approach uses cheap models for scaffolding and reserves Opus for output that users actually see.

Real numbers for a 10-step agent

Agent workflow: parse → plan → read files (×3) → implement (×2) → test → review → output

Approach	Cost/run	100 runs/month
Opus everything	$2.40	$240
Manual routing	$0.72	$72
Auto-routing (balanced)	$0.58	$58
Flash everything	$0.03	$3 (quality degrades)

The 240x difference between Opus-everything and Flash-everything is real — but quality degrades on Flash for the hard steps. The sweet spot is routing: $58-72/month vs $240, with Opus still handling the complex steps.

What to watch out for

Context window bleed. Agents often append conversation history to every call. A 10-step agent where each step adds 1K tokens to the context = 55K total input tokens on the final step. Your routing decision about step complexity matters less than your context management.

Tool call overhead. Every tool call is a round-trip API call. An agent that calls 5 tools per step at Opus pricing = 5× the cost per step. Use cheap models for tool parsing, expensive models for reasoning.

Retry loops. If an agent retries a failed step 3 times, you've paid 3× for one step. Add exponential backoff AND downgrade the model on retry (if it failed once, trying a different model is more useful than the same expensive model again).

The two-line change

If you're using any OpenAI-compatible agent framework (LangChain, AutoGen, CrewAI, custom), the change is:

# Before:
client = OpenAI(api_key="sk-...")

# After:
client = OpenAI(
    base_url="https://www.komilion.com/api/v1",
    api_key="ck_your_komilion_key"
)

Same code. Different cost profile. The routing happens transparently.

$5 free at komilion.com — no card, start testing immediately.

How to Run Cline for $10/Month (Instead of $60+)

Robin — Sat, 21 Feb 2026 15:23:33 +0000

Cline is one of the best AI coding assistants available. It's also easy to accidentally spend $60-200/month on it if you're not paying attention.

Here's how to get your Cline bill under $10/month without gutting the quality.

Why Cline gets expensive

Cline is an agentic tool. For every task you give it, it makes multiple API calls:

Initial understanding (reading files, asking clarifying questions)
Planning the approach
Making edits (one API call per file, sometimes more)
Verifying the changes
Handling errors and retries

A "write me a function" request might trigger 8-15 API calls. At Opus pricing, that's $4-8 for one task.

Do 20 coding tasks a day? That's $80-160/day at full Opus. Obviously nobody runs it that hard, but even 5-6 complex tasks/day adds up fast.

The model math

The default for heavy Cline users is often Opus or Sonnet. Here's what each actually costs per Cline session:

Typical session (30 API calls, ~3K tokens each = 90K tokens):

Model	Input price	Output price	Session cost
Claude Opus 4.6	$15/M	$75/M	~$4.05
Claude Sonnet 4.6	$3/M	$15/M	~$0.81
Gemini 3 Pro	$3.5/M	$10.5/M	~$0.65
Gemini 3 Flash	$0.075/M	$0.30/M	~$0.014

The problem: most of those 30 calls don't need Opus. File reads, task confirmations, simple completions — these work on cheap models. Only complex reasoning and hard edits actually need the top model.

Strategy 1: Manual model switching

Cheapest with no external tool. In Cline settings:

Default to Sonnet 4.6 (claude-sonnet-4-6)
Switch to Opus manually for hard sessions
Use Gemini Flash for quick Q&A

This works if you're disciplined. Most people aren't — they set one model and forget it.

Realistic monthly cost: $15-25 if you actually switch. $50-80 if you forget.

Strategy 2: Automatic routing (set-and-forget)

Point Cline at a routing layer that automatically picks the right model per call. You never change your config.

In Cline's settings:

Go to API Provider → OpenAI Compatible
Set Base URL to https://www.komilion.com/api/v1
Set Model to neo-mode/balanced
Get a key at komilion.com ($5 free to start)

What happens next: Cline sends each API call to the router. Simple file reads, quick questions, and confirmations route to Gemini Flash-class models (~$0.006/call). Complex edits and hard problems route to Sonnet or Opus class (~$0.10-0.55/call). You see the actual model used in data["komilion"]["neo"]["brainModel"] in each response.

Realistic monthly cost: $6-18 depending on how many complex tasks you run.

The real numbers from a mixed session

Say you run 100 Cline API calls on a typical day:

~60 are simple (file reads, task confirmations, short questions)
~30 are moderate (standard code edits, debugging)
~10 are complex (architecture decisions, hard bugs, multi-file refactors)

Approach	Simple (60)	Moderate (30)	Complex (10)	Day total	Month
Opus everywhere	60 × $0.55 = $33	30 × $0.55 = $16.50	10 × $0.55 = $5.50	$55	$1,650
Smart routing	60 × $0.006 = $0.36	30 × $0.10 = $3.00	10 × $0.55 = $5.50	$8.86	$265

The "Opus everywhere" row is what most people actually run — one model set, never changed. Smart routing gets you the same Opus quality on complex calls, cheap models everywhere else.

"I'll manually switch" is the classic plan that lasts about two days before you forget to switch back.
(Padme: replaced scratchpad math with clean two-row table aligned with Article 10 numbers. Per-call avg for Opus = $0.55 at typical Cline context depth. Smart routing row confirmed correct: $8.86/day matches Article 10 exactly.)

Getting the $10/month target

$10/month = ~$0.33/day.

At routing rates, that's about 37 moderate API calls per day, or 1,650 frugal calls per day (file reads, quick questions).

Realistic: a light Cline user running 2-3 small tasks/day with smart routing hits $5-15/month. A heavy user doing 10+ complex tasks daily will be higher.

The $10 target is achievable if you're using Cline primarily for moderate complexity work with frugal for the auxiliary calls.

One Cline-specific setting worth knowing

Cline has a "system prompt" override. If you add:

Keep responses concise. Confirm before making large changes.

...you reduce the average token count per response, which directly reduces cost. Verbose AI responses = more output tokens = higher bill.

Quick setup

Get a key at komilion.com — $5 free, no card
Cline settings → API Provider → OpenAI Compatible
Base URL: https://www.komilion.com/api/v1
Model: neo-mode/balanced
Keep neo-mode/premium in a second profile for when you specifically need Opus

Your bill changes immediately.