<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Robin</title>
    <description>The latest articles on Forem by Robin (@robinbanner).</description>
    <link>https://forem.com/robinbanner</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3771849%2Fdcf31c96-600e-4ff2-8136-5518ca50059e.jpg</url>
      <title>Forem: Robin</title>
      <link>https://forem.com/robinbanner</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/robinbanner"/>
    <language>en</language>
    <item>
      <title>Your First Komilion API Call in 60 Seconds</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Thu, 26 Mar 2026 10:05:32 +0000</pubDate>
      <link>https://forem.com/robinbanner/your-first-komilion-api-call-in-60-seconds-5e68</link>
      <guid>https://forem.com/robinbanner/your-first-komilion-api-call-in-60-seconds-5e68</guid>
      <description>&lt;h1&gt;
  
  
  Your First Komilion API Call in 60 Seconds
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Hossein Shahrokni&lt;/strong&gt; | &lt;em&gt;March 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you just signed up for Komilion and are staring at a blank dashboard: here's exactly what to do. This takes 60 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your Komilion API key (starts with &lt;code&gt;ck_&lt;/code&gt; — visible in your dashboard)&lt;/li&gt;
&lt;li&gt;Python 3.7+ or Node.js 16+, OR curl&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No new SDK. Komilion is OpenAI-compatible — if you've used the OpenAI API before, the interface is identical.&lt;/p&gt;




&lt;h2&gt;
  
  
  Option 1: Python (60 seconds)
&lt;/h2&gt;

&lt;p&gt;Install the OpenAI SDK if you haven't already:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ck_your_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# paste your actual key here
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the fastest way to find a duplicate in a Python list?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# See what model handled it and what it cost:
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_extra&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;komilion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brainModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tier:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_extra&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;komilion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cost:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_extra&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;komilion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you see output — you're in. The &lt;code&gt;brainModel&lt;/code&gt; field shows which model handled your request. The &lt;code&gt;tier&lt;/code&gt; will say &lt;code&gt;"balanced"&lt;/code&gt;. The &lt;code&gt;cost&lt;/code&gt; is what that call cost in USD.&lt;/p&gt;




&lt;h2&gt;
  
  
  Option 2: curl (30 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://www.komilion.com/api/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer ck_your_key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "neo-mode/balanced",
    "messages": [{"role": "user", "content": "What is the fastest way to find a duplicate in a Python list?"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll get a standard OpenAI-format JSON response plus a &lt;code&gt;komilion&lt;/code&gt; object in the response body with routing metadata.&lt;/p&gt;




&lt;h2&gt;
  
  
  Option 3: Existing OpenAI code (20 seconds)
&lt;/h2&gt;

&lt;p&gt;If you already have code using the OpenAI SDK, change two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ck_your_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the model string to &lt;code&gt;neo-mode/balanced&lt;/code&gt;. Every other parameter — messages, temperature, stream, max_tokens — stays the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the three model strings do
&lt;/h2&gt;

&lt;p&gt;Once you have the first call working, here's how to use all three tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Commit messages, summaries, format conversion — ~$0.006/call
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/frugal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a git commit message for this diff: ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Bug fixes, code review, new functions — ~$0.08/call (default)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this function for edge cases: ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# System design, architecture, security review — council mode, ~90s response
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Design the database schema for a multi-tenant SaaS: ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The routing metadata in every response tells you what tier was used and what it cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  If something goes wrong
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;401 Unauthorized&lt;/code&gt;&lt;/strong&gt; — API key is wrong or missing. Make sure you're using your &lt;code&gt;ck_&lt;/code&gt; key, not an OpenAI key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;400 Bad Request&lt;/code&gt; on the model string&lt;/strong&gt; — The model string must be exactly &lt;code&gt;neo-mode/frugal&lt;/code&gt;, &lt;code&gt;neo-mode/balanced&lt;/code&gt;, or &lt;code&gt;neo-mode/premium&lt;/code&gt;. Do not use &lt;code&gt;anthropic/claude-opus-4-6&lt;/code&gt; or any other model string — those will return 400.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;402 Insufficient Balance&lt;/code&gt;&lt;/strong&gt; — Your wallet balance is $0. Top up at komilion.com/dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Empty &lt;code&gt;komilion&lt;/code&gt; metadata&lt;/strong&gt; — Upgrade to &lt;code&gt;openai&amp;gt;=1.0.0&lt;/code&gt;. The &lt;code&gt;model_extra&lt;/code&gt; field requires the newer SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slow response on Premium&lt;/strong&gt; — Expected. The council runs multiple specialists, which can take up to 90 seconds. Use Balanced for interactive requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Once your first call works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;neo-mode/balanced&lt;/code&gt; as the default everywhere in your codebase&lt;/li&gt;
&lt;li&gt;Override to &lt;code&gt;neo-mode/frugal&lt;/code&gt; for formatting, summarization, and commit messages&lt;/li&gt;
&lt;li&gt;Override to &lt;code&gt;neo-mode/premium&lt;/code&gt; only when the output is going to production without review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Phase 4 benchmark (30 calls, 10 developer tasks, all outputs published) is at komilion.com/compare-v2 — worth reading before you commit to a tier strategy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions: &lt;a href="mailto:support@komilion.com"&gt;support@komilion.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>devtools</category>
      <category>python</category>
    </item>
    <item>
      <title>Three Ways to Handle AI Model Routing in 2026 (And the Trade-offs Nobody Talks About)</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Wed, 18 Mar 2026 17:02:16 +0000</pubDate>
      <link>https://forem.com/robinbanner/three-ways-to-handle-ai-model-routing-in-2026-and-the-trade-offs-nobody-talks-about-4opa</link>
      <guid>https://forem.com/robinbanner/three-ways-to-handle-ai-model-routing-in-2026-and-the-trade-offs-nobody-talks-about-4opa</guid>
      <description>&lt;h1&gt;
  
  
  Three Ways to Handle AI Model Routing in 2026 (And the Trade-offs Nobody Talks About)
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Hossein Shahrokni&lt;/strong&gt; | &lt;em&gt;2026-03-18&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you're building on top of AI models, you've probably hit the same wall: you have 400 models available and no principled way to decide which one handles which request. Defaulting to Opus on everything works, but it's expensive. Defaulting to Gemini Flash on everything is cheap but breaks on complex tasks.&lt;/p&gt;

&lt;p&gt;The routing problem is real. Here are the three patterns I see in production, with honest trade-offs for each.&lt;/p&gt;




&lt;h2&gt;
  
  
  Approach 1: You route manually (OpenRouter / direct API)
&lt;/h2&gt;

&lt;p&gt;The simplest setup: you pick the model per request, or per endpoint, or per environment. OpenRouter makes this easy — one API, 400+ models, you decide what goes where.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it looks like in code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Explicit model selection per request
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# You decide this
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When this is right:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a small, stable prompt library where you know exactly what each prompt needs&lt;/li&gt;
&lt;li&gt;Your team has strong opinions about specific models for specific use cases&lt;/li&gt;
&lt;li&gt;You want to experiment across models and control the comparison&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The honest cost:&lt;/strong&gt; Every time you add a new prompt type or the model landscape changes, someone has to review the routing rules. "Anthropic changed the pricing for Haiku" is a maintenance event. "GPT-5 is better for code" is another. Manual routing is a configuration that goes stale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Approach 2: You self-host a router
&lt;/h2&gt;

&lt;p&gt;Open-source routing layers let you deploy automated model selection on your own infrastructure. You define classification rules, it routes accordingly, zero markup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The value proposition:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No markup on model costs — you pay provider rates directly&lt;/li&gt;
&lt;li&gt;Full control over the routing logic&lt;/li&gt;
&lt;li&gt;Your prompts never leave your infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When this is right:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise or regulated environments where data residency matters&lt;/li&gt;
&lt;li&gt;Teams with the ops capacity to maintain a routing layer&lt;/li&gt;
&lt;li&gt;High volume where even a small markup compounds significantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The honest cost:&lt;/strong&gt; You own the operational overhead. When a model goes down, you handle the fallback. When the classification logic needs updating, that's engineering time. "Free" in dollars is not free in hours.&lt;/p&gt;




&lt;h2&gt;
  
  
  Approach 3: You use a managed router (Komilion / Martian)
&lt;/h2&gt;

&lt;p&gt;A managed routing layer handles classification automatically. You set a quality floor — frugal, balanced, premium — and the service picks the cheapest capable model for each request. One URL change from your current setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it looks like in code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Same SDK, different base_url and model string
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Router decides the actual model
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When this is right:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want cost optimization without maintaining routing logic&lt;/li&gt;
&lt;li&gt;Your prompt mix is diverse and hard to classify manually&lt;/li&gt;
&lt;li&gt;You'd rather pay a markup than own the infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The honest cost:&lt;/strong&gt; You're paying a markup (~25% on model costs in Komilion's case) for the automation. And you're trusting someone else's classification. We publish our routing decisions and benchmark data at komilion.com/compare-v2 — every output, every judge score, JSON download — because "trust our router" is a weak argument and "here's what it actually picked and why" is a stronger one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The question that matters
&lt;/h2&gt;

&lt;p&gt;None of these approaches is universally right. The question is: &lt;strong&gt;what's the cost of a wrong routing decision in your stack?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're routing a customer-facing chatbot and a wrong tier degrades the response quality noticeably, manual routing with explicit model selection makes sense — the stakes are high enough to justify the maintenance.&lt;/p&gt;

&lt;p&gt;If you're routing developer tooling (Cline sessions, internal code review, CI pipeline summaries), the wrong tier mostly means "slightly less thorough output on that one request." Managed routing's occasional miss is worth the cost savings.&lt;/p&gt;

&lt;p&gt;If you process millions of requests and the markup compounds to real money, self-hosted is worth the ops cost. At 10K calls/month, the math doesn't work out that way — but at 10M calls/month, it does.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually use
&lt;/h2&gt;

&lt;p&gt;For Komilion's own internal tooling (Cline sessions, benchmark scripts, documentation drafts), we use Balanced tier by default. File reads and summaries route to Frugal automatically.&lt;/p&gt;

&lt;p&gt;The benchmark result that drove this split: Balanced beats Opus on 6 of 10 real developer tasks at $0.08/task vs Opus's $0.17. Frugal matches Opus on summarization and code explanation at ~57x lower cost (8.3/10 vs 8.6/10). Full outputs at komilion.com/compare-v2.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;komilion.com — Sign up free, no card required. Drop a comment if you want test credits.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why I Built on Top of OpenRouter Instead of Building a Model Gateway from Scratch</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Wed, 18 Mar 2026 15:29:01 +0000</pubDate>
      <link>https://forem.com/robinbanner/why-i-built-on-top-of-openrouter-instead-of-building-a-model-gateway-from-scratch-36p</link>
      <guid>https://forem.com/robinbanner/why-i-built-on-top-of-openrouter-instead-of-building-a-model-gateway-from-scratch-36p</guid>
      <description>&lt;h1&gt;
  
  
  Why I Built on Top of OpenRouter Instead of Building a Model Gateway from Scratch
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Hossein Shahrokni&lt;/strong&gt; | &lt;em&gt;March 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The most common comment I get on Komilion: "Isn't this just a wrapper around OpenRouter?"&lt;/p&gt;

&lt;p&gt;Yes, partly. And that's a deliberate choice. Here's the reasoning.&lt;/p&gt;




&lt;h2&gt;
  
  
  What OpenRouter actually gives you
&lt;/h2&gt;

&lt;p&gt;OpenRouter is a model marketplace — one API key, 400+ models, provider-level pricing. You call &lt;code&gt;openrouter.ai/api/v1&lt;/code&gt;, pick any model by ID, pay the provider rate directly. No markup on the models.&lt;/p&gt;

&lt;p&gt;What it does not give you: routing logic. You still decide which model handles which request. OpenRouter is the menu. You're still the waiter.&lt;/p&gt;

&lt;p&gt;That's a real gap for most production AI apps, and it's the gap Komilion was built to fill.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why build on top instead of building from scratch
&lt;/h2&gt;

&lt;p&gt;When I started Komilion, I had two options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Build a full model gateway.&lt;/strong&gt; Direct integrations with Anthropic, OpenAI, Google, Mistral, Groq. Manage API keys, rate limits, failover, billing, and model availability for each provider separately. Full control, zero dependency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: Build on top of OpenRouter.&lt;/strong&gt; Use their unified API, inherit their model coverage, focus engineering time on the routing classification layer.&lt;/p&gt;

&lt;p&gt;I chose Option B for three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. OpenRouter solves the hard operational problems.&lt;/strong&gt; Provider failover when Anthropic is down. Model availability checks. New models added within hours of release. Billing unified across providers. These are real engineering problems — I've seen teams spend 3-4 months building and maintaining this layer. Building on top means that time goes to the routing logic instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Model coverage compounds.&lt;/strong&gt; OpenRouter has 400+ models from 30+ providers. Building direct integrations means constantly adding new providers when a good model ships on an unfamiliar platform. With OpenRouter as the foundation, when Groq releases a new model that benchmarks well for frugal tasks, it's available immediately. No new integration work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The value I'm adding is in the classification, not the API plumbing.&lt;/strong&gt; Komilion's routing has four layers: a regex fast path for obvious simple requests, an LLM classifier for ambiguous ones, benchmark-scored model selection based on task type, and provider failover. That's the hard part. The model access itself is a solved problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Komilion adds
&lt;/h2&gt;

&lt;p&gt;When you send a request to &lt;code&gt;neo-mode/balanced&lt;/code&gt;, here's what happens before a model ever sees your prompt:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regex fast path&lt;/strong&gt; — if the request matches a known simple pattern (file reads, summaries, commit message boilerplate), it routes immediately without running a classifier. Sub-100ms overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM classifier&lt;/strong&gt; — ambiguous requests go through a lightweight classifier that determines task complexity and category. This is where most routing decisions happen.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benchmark-scored model selection&lt;/strong&gt; — the classifier output maps to a model pool ranked by benchmark performance and current provider pricing. The cheapest capable model wins.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Provider failover&lt;/strong&gt; — if the selected model's provider returns an error, the request falls through to the next ranked option automatically. Your app doesn't see the failure.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this requires you to think about which model you're using. You set a quality floor — frugal, balanced, or premium — and the router handles the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest trade-off
&lt;/h2&gt;

&lt;p&gt;Komilion charges a markup on top of OpenRouter's provider-level pricing (~25%). You're paying for the routing automation and the ops you're not running.&lt;/p&gt;

&lt;p&gt;Whether that's worth it depends on your call volume and team. At 10,000 calls/month, the markup is a rounding error compared to the cost of routing incorrectly or maintaining your own routing layer. At 10 million calls/month, the math changes and you should probably evaluate self-hosted options.&lt;/p&gt;

&lt;p&gt;The alternative to Komilion isn't free — it's your time maintaining routing rules, updating model selections as the landscape changes, and handling the edge cases when a model you hardcoded gets deprecated. That cost is real, it just doesn't show up on an invoice.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one thing you should know if you're evaluating this
&lt;/h2&gt;

&lt;p&gt;Komilion is built on OpenRouter, and that's not a secret. The routing logic and the classification layer are where the value is. The benchmark data at komilion.com/compare-v2 is the proof — 30 calls, 10 real developer tasks, every output published unedited.&lt;/p&gt;

&lt;p&gt;If you want to evaluate the routing, that's where to start. If the routing doesn't hold up for your workload, I'd rather you know that before you integrate than after.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;komilion.com — DM for test credits.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>openrouter</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The $0.003 vs $0.17 Test: When Does the Cheap Model Actually Win?</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Sat, 14 Mar 2026 08:00:56 +0000</pubDate>
      <link>https://forem.com/robinbanner/the-0003-vs-017-test-when-does-the-cheap-model-actually-win-g32</link>
      <guid>https://forem.com/robinbanner/the-0003-vs-017-test-when-does-the-cheap-model-actually-win-g32</guid>
      <description>&lt;h1&gt;
  
  
  The $0.003 vs $0.17 Test: When Does the Cheap Model Actually Win?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Julia Paulsen&lt;/strong&gt; | &lt;em&gt;2026-03-14&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I built an AI router that automatically picks the cheapest capable model for each request. The pitch is that you shouldn't pay $0.17 for tasks a $0.003 model handles just as well.&lt;/p&gt;

&lt;p&gt;So we ran a benchmark. Ten real developer tasks. Cheap model (frugal tier, auto-routed) vs Opus 4.6 direct. An LLM judge scored each response three times.&lt;/p&gt;

&lt;p&gt;The honest answer: the cheap model won 3 of 10 times. Tied once. Lost 6 times.&lt;/p&gt;

&lt;p&gt;That sounds bad. But here's what the cost column looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  The data
&lt;/h2&gt;

&lt;p&gt;Ten developer tasks, 30 judge calls each per tier. Frugal tier (auto-routed) vs Opus 4.6 (baseline). Judge: Gemini 2.5 Flash (Hermione), 3 runs per comparison.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Frugal Score&lt;/th&gt;
&lt;th&gt;Opus Score&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;th&gt;Frugal Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Code generation (compound interest)&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Frugal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0031&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Debug a list comprehension&lt;/td&gt;
&lt;td&gt;9.0&lt;/td&gt;
&lt;td&gt;9.0&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;td&gt;$0.0021&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Explain async/await evolution&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Frugal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0038&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Write unit tests for parse_config&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$0.0054&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Code generation (compound interest v2)&lt;/td&gt;
&lt;td&gt;9.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$0.0016&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Research: global AI market summary&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Git commit message generation&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0003&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;SQL query optimization (10M rows)&lt;/td&gt;
&lt;td&gt;8.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$0.0034&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Scale real-time chat to 10K users&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Frugal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.0036&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;REST API design&lt;/td&gt;
&lt;td&gt;7.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$0.0041&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Frugal avg: 8.3/10. Opus avg: 8.6/10. Frugal avg cost: $0.003/task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Opus costs roughly $0.17/task in this benchmark. That's a 56x cost difference for a 0.3-point quality difference across all 10 tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 6 cost $0.0000
&lt;/h2&gt;

&lt;p&gt;That's not a rounding artifact. The router picked Gemini 2.5 Flash for the AI market research task. Gemini Flash has a free tier. The task cost zero dollars and scored 8.0 against Opus's 9.0.&lt;/p&gt;

&lt;p&gt;Is 8.0 vs 9.0 worth $0.17? Depends what you're doing. For a background research pass that feeds into something else, probably not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 1: frugal beat Opus 8.0 vs 5.3
&lt;/h2&gt;

&lt;p&gt;The judge scored frugal's compound interest implementation 8.0 and Opus's 5.3. Frugal wrote complete, tested code with edge cases. Opus wrote an incomplete implementation with a rate calculation error that the judge flagged across all three runs.&lt;/p&gt;

&lt;p&gt;This was the most surprising result. Opus is supposed to be the gold standard for code quality. On a standard Python implementation task, the routing picked a cheaper model that just... did it better.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Opus clearly won
&lt;/h2&gt;

&lt;p&gt;Tasks 4, 8, and 10 were not close. Unit test generation (edge cases, mock patterns, fixture design), SQL optimization on a 10M-row table, and complex REST API design — Opus outperformed by a full point or more.&lt;/p&gt;

&lt;p&gt;Task 10 gap: frugal 7.0, Opus 9.0. That's the kind of gap that matters. A 7.0 API design might miss security considerations or return problematic patterns. That task should cost $0.17.&lt;/p&gt;




&lt;h2&gt;
  
  
  The routing signal
&lt;/h2&gt;

&lt;p&gt;Looking at where frugal wins vs loses, there's a pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frugal tends to win or tie:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard implementation tasks (no novel architecture needed)&lt;/li&gt;
&lt;li&gt;Explanation/education (async/await, concepts that have established answers)&lt;/li&gt;
&lt;li&gt;Debugging obvious bugs (the list comprehension logic flaw)&lt;/li&gt;
&lt;li&gt;Research summarization (reporting existing information)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Opus tends to win:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test generation (edge case discovery benefits from Opus's reasoning depth)&lt;/li&gt;
&lt;li&gt;Complex architecture (API design, SQL optimization require multi-factor tradeoff reasoning)&lt;/li&gt;
&lt;li&gt;Tasks where "good enough" isn't good enough (production security design)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The routing signal isn't task &lt;em&gt;length&lt;/em&gt; or task &lt;em&gt;topic&lt;/em&gt; — it's task &lt;em&gt;complexity&lt;/em&gt;. Low-complexity tasks have established patterns. The cheap model has seen those patterns. High-complexity tasks require novel reasoning chains. Opus is better there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this looks like at scale — a real budget example
&lt;/h2&gt;

&lt;p&gt;Take a 15-person dev team shipping a SaaS product. Based on industry data, a team like this makes roughly 3,000 AI API calls per developer per month — code generation, debugging, commit messages, test writing, documentation, code review. That's &lt;strong&gt;45,000 calls/month&lt;/strong&gt; across the team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All-Opus approach:&lt;/strong&gt;&lt;br&gt;
45,000 calls × $0.17 = &lt;strong&gt;$7,650/month&lt;/strong&gt; | $91,800/year&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smart routing (based on our benchmark data):&lt;/strong&gt;&lt;br&gt;
Our benchmark shows ~60% of developer tasks are low-complexity (commit messages, debugging, explanations, research) where frugal scores within 0.3 points of Opus. The remaining ~40% are high-complexity tasks (architecture, security, test generation) where Opus justifies its cost.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;% of calls&lt;/th&gt;
&lt;th&gt;Calls/mo&lt;/th&gt;
&lt;th&gt;Cost/call&lt;/th&gt;
&lt;th&gt;Monthly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frugal (auto-routed)&lt;/td&gt;
&lt;td&gt;60%&lt;/td&gt;
&lt;td&gt;27,000&lt;/td&gt;
&lt;td&gt;$0.003&lt;/td&gt;
&lt;td&gt;$81&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus (complex tasks)&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;td&gt;18,000&lt;/td&gt;
&lt;td&gt;$0.17&lt;/td&gt;
&lt;td&gt;$3,060&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;45,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3,141&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Savings: $4,509/month — 59% reduction.&lt;/strong&gt; That's $54,108/year back in the budget.&lt;/p&gt;

&lt;p&gt;And the quality trade-off? On the 60% routed to frugal, you're getting 8.3/10 instead of 8.6/10. On the 40% that still goes to Opus, you're getting full quality where it matters. Your architecture reviews, security audits, and complex test suites still get the best model. Your commit messages and docstrings don't need to cost $0.17 each.&lt;/p&gt;

&lt;p&gt;For a startup burning $50K/month, reclaiming $4.5K is meaningful. For an enterprise team with 100 developers, multiply those numbers by 7 — that's $378K/year in API costs you didn't need to spend.&lt;/p&gt;


&lt;h2&gt;
  
  
  What this means in practice
&lt;/h2&gt;

&lt;p&gt;If you're routing all your API calls through a single model by default — Claude Opus, GPT-4.5, whatever — you're paying $0.17 for tasks that a $0.003 model handles at 8.3/10 quality.&lt;/p&gt;

&lt;p&gt;For most day-to-day developer work: commit messages, code explanations, debugging known error patterns, summarizing documentation — the cheap model is close enough. The 0.3-point quality difference is not detectable in practice.&lt;/p&gt;

&lt;p&gt;For tasks where you'd read the output carefully — security-critical code, API design, complex architecture decisions — pay the $0.17.&lt;/p&gt;

&lt;p&gt;The router does this automatically. Frugal tier routes to the cheapest capable model. Balanced tier routes to Sonnet-class (8.7/10 avg, beats Opus on 8 of 10 tasks at $0.08). You don't have to decide per task.&lt;/p&gt;

&lt;p&gt;Full benchmark outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download. Read the Task 10 outputs specifically if you want to understand the gap.&lt;/p&gt;

&lt;p&gt;Integration is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Frugal tier: auto-routes to cheapest capable model
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/frugal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a git commit message for...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# komilion.routing.selectedModel shows which model was picked and why
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Cline, Cursor, Aider, LangChain, anything speaking the OpenAI format. Sign up free at komilion.com — no card required.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Phase 4 benchmark: 10 developer tasks, 4 tiers, 30 judge calls per comparison. Judge: Hermione (Gemini 2.5 Flash). Full outputs: komilion.com/compare-v2.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>benchmarks</category>
      <category>llm</category>
    </item>
    <item>
      <title>We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Wed, 04 Mar 2026 12:31:13 +0000</pubDate>
      <link>https://forem.com/robinbanner/we-published-that-our-premium-tier-failed-on-60-of-tasks-then-we-fixed-it-1a9a</link>
      <guid>https://forem.com/robinbanner/we-published-that-our-premium-tier-failed-on-60-of-tasks-then-we-fixed-it-1a9a</guid>
      <description>&lt;h1&gt;
  
  
  We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Hossein Shahrokni&lt;/strong&gt; | &lt;em&gt;2026-02-26&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When we launched on Product Hunt, our Phase 4 benchmark was live on the site.&lt;/p&gt;

&lt;p&gt;It showed council mode — our multi-model premium tier — timing out on 6 of 10 developer tasks. We didn't hide those numbers. They were in the benchmark table, linked from the maker comment, publicly downloadable as JSON.&lt;/p&gt;

&lt;p&gt;This is the follow-up post. We shipped the fix. Here's what broke, what we changed, and what Phase 5 shows.&lt;/p&gt;




&lt;h2&gt;
  
  
  What council mode is
&lt;/h2&gt;

&lt;p&gt;Council mode runs each request through four specialist models — Code, Research, Creative, and a Captain who synthesizes their outputs — before returning an answer. The verification pass is what makes it more than just asking four models and averaging. The Captain cross-examines the specialists' outputs, catches contradictions, and produces a synthesized response.&lt;/p&gt;

&lt;p&gt;The benchmark hypothesis: four specialists catching each other's errors should outperform a single model, even the best one. Phase 4 was the first time we actually ran it at scale. It told a different story.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Phase 4 showed
&lt;/h2&gt;

&lt;p&gt;Phase 4 (Feb 25, 30 calls, 10 developer tasks): council timed out on 6 of 10 tasks. The benchmark recorded timeouts as failures. The result was a 5-something/10 average that told us nothing about whether the underlying approach worked — just that the implementation had a critical fault.&lt;/p&gt;

&lt;p&gt;We published it anyway. If you're running a transparency-first benchmark, you publish the ugly runs too.&lt;/p&gt;




&lt;h2&gt;
  
  
  The root cause
&lt;/h2&gt;

&lt;p&gt;Each specialist call had a 90-second AbortSignal. Four specialists running sequentially. Worst case: 4 × 90s = 360 seconds of execution time.&lt;/p&gt;

&lt;p&gt;Connection timeout on Vercel: 90 seconds.&lt;/p&gt;

&lt;p&gt;The math was wrong from the start. Under load, every council request that hit a slow specialist exceeded the connection ceiling and died.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;The sequential architecture was correct — that's what drives council quality. Each specialist reads what the previous one said before responding. Running them in parallel would break that.&lt;/p&gt;

&lt;p&gt;What was wrong was the per-specialist timeout with no ceiling on the total pipeline. Sprint 12 added &lt;code&gt;PIPELINE_TOTAL_TIMEOUT_MS&lt;/code&gt; — a hard ceiling on total council execution time — plus a streaming bypass for simple requests (~2.4s). Complex tasks run the full sequential chain within a fixed budget. If a specialist runs long, the Captain synthesizes with whatever's complete. Zero timeouts since the fix shipped.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 5 results
&lt;/h2&gt;

&lt;p&gt;We re-ran the benchmark with council V3 live.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Score (dev tasks)&lt;/th&gt;
&lt;th&gt;Won vs Opus&lt;/th&gt;
&lt;th&gt;vs. Phase 4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Council V3&lt;/td&gt;
&lt;td&gt;8.77/10&lt;/td&gt;
&lt;td&gt;8 of 10&lt;/td&gt;
&lt;td&gt;was timing out 6/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;8.80/10&lt;/td&gt;
&lt;td&gt;8 of 10&lt;/td&gt;
&lt;td&gt;unchanged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6 direct&lt;/td&gt;
&lt;td&gt;8.6/10&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frugal&lt;/td&gt;
&lt;td&gt;8.3/10&lt;/td&gt;
&lt;td&gt;3 of 10&lt;/td&gt;
&lt;td&gt;unchanged&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Council wins 8 of 10 developer tasks head-to-head against Opus direct. Avg response time: ~90s. Zero timeouts.&lt;/p&gt;

&lt;p&gt;The council dev average (8.77) now beats both Opus (8.6) and Balanced (8.7) on developer tasks. The wins are clearest on architecture decisions, complex reasoning, and open-ended design — tasks where specialist cross-examination resolves ambiguity before synthesis. The all-task average (7.27 across 16 tasks including non-dev tasks) shows council is optimized for developer work, not general use. Full outputs are published — read the individual tasks and judge which council wins are meaningful for your use case.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Note: cost column omitted — benchmark doesn't track multi-model cost. See komilion.com/pricing for current premium tier pricing.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Full outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for the three tiers
&lt;/h2&gt;

&lt;p&gt;Frugal and Balanced haven't changed. Phase 4 confirmed both: Balanced beats Opus on 8 of 10 tasks at half the cost, Frugal at 97% quality for 1.6% of the cost. Those findings stand.&lt;/p&gt;

&lt;p&gt;Premium (&lt;code&gt;neo-mode/premium&lt;/code&gt;) now routes to council V3. If you were on Premium before this post, your next call goes to council automatically.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;komilion.neo.councilTrace&lt;/code&gt; field in the response shows the full specialist breakdown — which model handled each role, what it contributed, how the Captain synthesized.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why we published the failure data
&lt;/h2&gt;

&lt;p&gt;Because you'd find it anyway. Developer benchmarks get read carefully. If we'd published Phase 4 with a footnote like "council results excluded due to technical issues," someone would have asked why.&lt;/p&gt;

&lt;p&gt;Publishing the failure is also how you prove the fix is real. The before and after are both in the data. You can verify it yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;komilion.com&lt;/strong&gt; — Phase 5 benchmark published same day as this post.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>architecture</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Council Mode Is Live. Four Specialist Models. One Answer.</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Wed, 04 Mar 2026 12:31:12 +0000</pubDate>
      <link>https://forem.com/robinbanner/council-mode-is-live-four-specialist-models-one-answer-2obg</link>
      <guid>https://forem.com/robinbanner/council-mode-is-live-four-specialist-models-one-answer-2obg</guid>
      <description>&lt;h1&gt;
  
  
  Council Mode Is Live. Four Specialist Models. One Answer.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By Hossein Shahrokni&lt;/strong&gt; | &lt;em&gt;2026-03-04&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When Komilion launched on Product Hunt, the premium tier was Opus 4.6 direct.&lt;/p&gt;

&lt;p&gt;That wasn't the plan. The plan was council mode: each request runs through four specialist models — a Code specialist, a Research specialist, a Creative specialist, and a Captain who synthesizes their outputs — before you get an answer.&lt;/p&gt;

&lt;p&gt;The V2 council had a problem. Sequential model calls, no hard ceiling on total execution time. Under load, the whole thing timed out. I wasn't going to launch with known instability, so I bypassed it and shipped Opus direct for Premium instead.&lt;/p&gt;

&lt;p&gt;This is the post that says it's fixed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What council mode actually does
&lt;/h2&gt;

&lt;p&gt;A standard API call goes: you → model → answer.&lt;/p&gt;

&lt;p&gt;Council mode goes: you → Code specialist → Research specialist → Creative specialist → Captain (synthesizes) → answer.&lt;/p&gt;

&lt;p&gt;The Captain doesn't just aggregate responses. It runs a cross-examination pass — each specialist's output gets evaluated against the others before the synthesis. The idea is that errors one model makes, another catches. The verification pass is what makes it more than just "ask four models and average the results."&lt;/p&gt;

&lt;p&gt;V3 adds a complexity gate: simple requests use a streaming bypass at ~2.4s, skipping the specialist pipeline entirely. Only tasks that need multi-specialist cross-examination run the full council chain at ~90s. The classification is automatic.&lt;/p&gt;




&lt;h2&gt;
  
  
  The benchmark (Phase 5, post-fix — live production)
&lt;/h2&gt;

&lt;p&gt;After the fix shipped, we ran Phase 5 immediately against production: 10 developer tasks, Hermione judge (Gemini 2.5 Flash), every response published.&lt;/p&gt;

&lt;p&gt;Council mode scored 8.77/10 on developer tasks vs Opus 4.6 direct at 8.6/10. Won on 8 of 10 developer tasks. Avg response time: ~90s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4 context (pre-fix):&lt;/strong&gt; council timed out on 6 of 10 tasks (60%), scored below threshold. We published that. This is the after.&lt;/p&gt;

&lt;p&gt;Full outputs at komilion.com/compare-v2 — every response, every judge verdict, JSON download.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why it wasn't at launch
&lt;/h2&gt;

&lt;p&gt;V2 had 4 specialist calls in sequence, each with a 90-second AbortSignal. Worst case: 360 seconds. Under real traffic that hit connection timeouts.&lt;/p&gt;

&lt;p&gt;V3 adds &lt;code&gt;PIPELINE_TOTAL_TIMEOUT_MS&lt;/code&gt; — a hard ceiling on total council execution time — and a streaming bypass for simple tasks (~2.4s). Complex tasks run the full sequential chain within a fixed budget. If a specialist runs long, the Captain synthesizes with whatever's complete. Zero timeouts since the fix shipped.&lt;/p&gt;

&lt;p&gt;We only shipped when Bugs confirmed it clean. That's the rule.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it means for the premium tier
&lt;/h2&gt;

&lt;p&gt;Premium (&lt;code&gt;neo-mode/premium&lt;/code&gt;) now routes to council V3, not Opus direct.&lt;/p&gt;

&lt;p&gt;For most developer work, Balanced (Sonnet 4.6, ~$0.08/call) is still the right tier. The benchmark showed Balanced beats Opus on 8 of 10 tasks. Council is for the cases where single-model ceiling matters: architecture decisions, complex multi-step reasoning, tasks where getting it right on the first call is worth more than the cost difference.&lt;/p&gt;

&lt;p&gt;If you were on Premium before today, your next call goes to council automatically. No config change.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Design a rate limiting strategy for a multi-tenant API with burst tolerance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# komilion.council in response shows which specialists ran and what each contributed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;komilion.council&lt;/code&gt; field in the response shows the full specialist breakdown — which model handled each role, what it contributed, how the Captain synthesized. Visible by default on premium requests.&lt;/p&gt;

&lt;p&gt;Sign up free at komilion.com — no card required.&lt;br&gt;
&lt;em&gt;(Harvey: if Komilion Discovery ships before this publishes, restore "$5 free credits" — it will be accurate again.)&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Council V3 benchmark: Phase 5, 10 developer tasks, 4 tiers. Judge: Hermione (Gemini 2.5 Flash). Full outputs: komilion.com/compare-v2.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>architecture</category>
      <category>api</category>
    </item>
    <item>
      <title>Komilion Balanced Tier Beats Opus 4.6 on 6 of 10 Developer Tasks at Half the Cost</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Sat, 28 Feb 2026 08:03:32 +0000</pubDate>
      <link>https://forem.com/robinbanner/komilion-balanced-tier-beats-opus-46-on-6-of-10-developer-tasks-at-half-the-cost-4eip</link>
      <guid>https://forem.com/robinbanner/komilion-balanced-tier-beats-opus-46-on-6-of-10-developer-tasks-at-half-the-cost-4eip</guid>
      <description>&lt;p&gt;The safe default for AI API calls is to route everything to the best model you can afford. I did it for months. Opus for every request -- commit messages, variable lookups, SQL optimization, architecture. All $0.17/call.&lt;/p&gt;

&lt;p&gt;We ran the numbers to see if that assumption holds.&lt;/p&gt;

&lt;p&gt;Ten real developer tasks. Real API calls. Real billing. We sent each task through three setups: Komilion frugal tier, Komilion balanced tier, and Claude Opus 4.6 called directly via the Anthropic API.&lt;/p&gt;

&lt;p&gt;The result: the balanced tier beat Opus on 6 of 10 tasks at half the cost. Frugal delivered 97% of Opus quality at 1.6% of the cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;3 configurations, 10 tasks:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Model selection&lt;/th&gt;
&lt;th&gt;Cost/task&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frugal&lt;/td&gt;
&lt;td&gt;Cheapest capable model, auto-selected&lt;/td&gt;
&lt;td&gt;~$0.003 avg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;Mid-tier, optimized for developer tasks&lt;/td&gt;
&lt;td&gt;~$0.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6 direct&lt;/td&gt;
&lt;td&gt;Claude Opus 4.6 via Anthropic API directly&lt;/td&gt;
&lt;td&gt;~$0.17&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;10 tasks&lt;/strong&gt; from real developer work: code generation, debugging, explanation, SQL optimization, architecture design, commit messages, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Judge:&lt;/strong&gt; Hermione -- a Gemini 2.5 Flash LLM judge scoring each response head-to-head. 3 runs per comparison to reduce variance. Scores are head-to-head relative ratings, not absolute quality measures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Avg Score&lt;/th&gt;
&lt;th&gt;Beat Opus&lt;/th&gt;
&lt;th&gt;Cost/task&lt;/th&gt;
&lt;th&gt;vs Opus cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;8.7/10&lt;/td&gt;
&lt;td&gt;6 of 10&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;53% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frugal&lt;/td&gt;
&lt;td&gt;8.3/10&lt;/td&gt;
&lt;td&gt;3 of 10&lt;/td&gt;
&lt;td&gt;~$0.003&lt;/td&gt;
&lt;td&gt;98% cheaper (56x lower)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.6 direct&lt;/td&gt;
&lt;td&gt;8.6/10&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;td&gt;$0.17&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Finding 1: Balanced beats Opus on most developer tasks
&lt;/h2&gt;

&lt;p&gt;This was the finding I did not expect.&lt;/p&gt;

&lt;p&gt;Balanced averaged 8.7/10. It beat Opus on 6 of 10 tasks. At $0.08/task vs $0.17 for Opus, that is better quality at half the cost.&lt;/p&gt;

&lt;p&gt;For well-defined developer tasks -- write this function, debug this code, optimize this query -- the balanced tier routes to Sonnet-class models highly tuned for exactly this type of work. The judge consistently scored them at or above Opus on tasks with clear success criteria.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A specific example:&lt;/strong&gt; Task 9. Frugal scored 8.67, balanced scored 8.67. Opus scored 8.33 and 7.67 across judge runs. A task requiring real technical depth -- and both cheaper tiers outscored the frontier model. This result appeared repeatedly across the 10-task run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where Opus still wins:&lt;/strong&gt; Task 10 tells the other story. Opus scored 9.0. Balanced scored 8.0. Frugal scored 7.0. For complex, open-ended problems where output breadth and multi-step reasoning visibly matter, Opus produces noticeably more thorough results. The judge valued that. It is a real gap -- on a narrower set of tasks than most developers assume.&lt;/p&gt;

&lt;p&gt;The tasks where Opus won cluster around a recognizable pattern: SQL optimization, unit test generation, REST API design. Tasks where the output has architectural depth, must satisfy multiple simultaneous constraints, or requires anticipating edge cases across a broad surface. On those, the frontier model earns its price tag. On the other 6 of 10 tasks, the balanced tier matched or outperformed it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding 2: Frugal delivers 97% of Opus quality at 1.6% of the cost
&lt;/h2&gt;

&lt;p&gt;Frugal averaged 8.3/10. It won 3 of 10 head-to-heads.&lt;/p&gt;

&lt;p&gt;At $0.003/task vs $0.17 for Opus, frugal delivers 97% of Opus quality at 56x lower cost. The tasks frugal handles best make up the majority of most developers' API traffic: commit messages, short explanations, summarization, quick lookups, simple code generation.&lt;/p&gt;

&lt;p&gt;The tasks where frugal struggles -- complex open-ended problems -- are real. For those, route to balanced or accept the Opus cost selectively.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest conclusion
&lt;/h2&gt;

&lt;p&gt;Balanced is the better default for most developer workloads.&lt;/p&gt;

&lt;p&gt;8.7/10 avg, 6 of 10 wins against Opus, $0.08/task. If you are routing everything to Opus, you are paying $0.17/call for results the balanced tier matches or beats on 60% of tasks.&lt;/p&gt;

&lt;p&gt;Frugal is the cost optimizer for simple-task volume. 97% of Opus quality. 1.6% of the cost.&lt;/p&gt;

&lt;p&gt;And on a specific subset of complex open-ended tasks, Opus still wins. That is not a bug -- it is the whole argument for intelligent routing. Know your task distribution. Route accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to try it
&lt;/h2&gt;

&lt;p&gt;OpenAI SDK compatible. One line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ck_your_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# free at komilion.com, no card
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Balanced -- recommended default based on this benchmark
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_extra&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;komilion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brainModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_extra&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;komilion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Cline, Cursor, Roo Code, Continue, any OpenAI-compatible client.&lt;/p&gt;




&lt;p&gt;All 30 responses from this benchmark run are published unedited at &lt;a href="https://www.komilion.com/compare-v2" rel="noopener noreferrer"&gt;komilion.com/compare-v2&lt;/a&gt; -- every response, every judge verdict, JSON download available.&lt;/p&gt;

&lt;p&gt;Komilion is live on Product Hunt today: &lt;a href="https://www.producthunt.com/posts/komilion" rel="noopener noreferrer"&gt;https://www.producthunt.com/posts/komilion&lt;/a&gt; -- if this was useful, an upvote takes 30 seconds.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Data from real API calls, February 2026. Phase 4 run: 10 tasks x 3 configurations = 30 calls. Judge: Hermione (Gemini 2.5 Flash), 3 runs per comparison.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>benchmarks</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Continue.dev + Claude Max Ban: Fix in 60 Seconds</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Wed, 25 Feb 2026 18:10:26 +0000</pubDate>
      <link>https://forem.com/robinbanner/continuedev-claude-max-ban-fix-in-60-seconds-1fpo</link>
      <guid>https://forem.com/robinbanner/continuedev-claude-max-ban-fix-in-60-seconds-1fpo</guid>
      <description>&lt;p&gt;Continue.dev's Claude integration stopped working for Claude Max subscription users in January 2026. This is the fix.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Continue.dev is and why it broke
&lt;/h2&gt;

&lt;p&gt;Continue.dev is an open-source AI coding assistant for VS Code and JetBrains. Unlike Cursor or Cline, Continue is free and fully configurable — you bring your own models.&lt;/p&gt;

&lt;p&gt;If you configured Continue to use Claude through your Claude Max subscription credentials, that path is now blocked. Anthropic's January 2026 enforcement restricted automated tool access through consumer subscription OAuth tokens.&lt;/p&gt;

&lt;p&gt;Continue.dev is actually one of the easiest tools to fix, because it was designed from the start to work with any provider via config.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix: update your config.json
&lt;/h2&gt;

&lt;p&gt;Continue.dev stores its configuration in &lt;code&gt;~/.continue/config.json&lt;/code&gt;. You're changing one section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (broken):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk-ant-..."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (working — Option A, direct Anthropic API):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude Opus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-anthropic-api-key"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your Anthropic API key at &lt;a href="https://console.anthropic.com" rel="noopener noreferrer"&gt;console.anthropic.com&lt;/a&gt;. Pay per token.&lt;/p&gt;




&lt;h2&gt;
  
  
  Option B: Smart routing (cheaper for mixed workloads)
&lt;/h2&gt;

&lt;p&gt;Continue.dev supports any OpenAI-compatible provider. This routes each request to the cheapest capable model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Komilion Balanced"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"neo-mode/balanced"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ck_your_key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.komilion.com/api/v1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Komilion Premium"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"neo-mode/premium"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ck_your_key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.komilion.com/api/v1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With two models configured, you can switch between them in the Continue sidebar. Use Balanced for most tasks, Premium for architecture or complex debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  Continue.dev-specific settings worth knowing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tab autocomplete&lt;/strong&gt; has its own model config in Continue — separate from the chat model. If you use tab completions, update that too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Komilion Frugal"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"neo-mode/frugal"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ck_your_key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.komilion.com/api/v1"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tab autocomplete fires constantly as you type. &lt;code&gt;neo-mode/frugal&lt;/code&gt; (~$0.006/call) keeps those completions cheap while saving the better models for explicit chat requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context providers&lt;/strong&gt; (the &lt;code&gt;@&lt;/code&gt; commands — &lt;code&gt;@file&lt;/code&gt;, &lt;code&gt;@codebase&lt;/code&gt;, etc.) use the main chat model. These are fine with Balanced or Premium.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost difference for Continue.dev users
&lt;/h2&gt;

&lt;p&gt;Continue users tend to run high call volumes — continuous tab completions plus explicit chat requests.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;th&gt;Direct Opus&lt;/th&gt;
&lt;th&gt;Smart routing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tab completions (300/day)&lt;/td&gt;
&lt;td&gt;~$165/day&lt;/td&gt;
&lt;td&gt;~$1.80/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat requests (50/day)&lt;/td&gt;
&lt;td&gt;~$27.50/day&lt;/td&gt;
&lt;td&gt;~$5.00/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$192/day&lt;/td&gt;
&lt;td&gt;~$6.80/day&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tab completion difference is particularly dramatic — those short, fast calls are exactly what cheap models handle best.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verify it is working
&lt;/h2&gt;

&lt;p&gt;After updating &lt;code&gt;config.json&lt;/code&gt;, reload the VS Code window (&lt;code&gt;Cmd+Shift+P&lt;/code&gt; -&amp;gt; "Developer: Reload Window").&lt;/p&gt;

&lt;p&gt;Test in the Continue sidebar with a simple question. You should see a response. If you are using Komilion, the actual model used appears in the API response headers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting a Komilion API key
&lt;/h2&gt;

&lt;p&gt;$5 free credits, no card: &lt;a href="https://www.komilion.com?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=continuedev-fix-feb26" rel="noopener noreferrer"&gt;komilion.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your key will be in the dashboard immediately after email verification. It starts with &lt;code&gt;ck_&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Full migration guide covering all tools: &lt;a href="https://www.komilion.com/cline?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=continuedev-fix-feb26" rel="noopener noreferrer"&gt;komilion.com/cline&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vscode</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Cline Keeps Routing to the Wrong (Expensive) Model — Here's What's Happening</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Tue, 24 Feb 2026 14:01:31 +0000</pubDate>
      <link>https://forem.com/robinbanner/cline-keeps-routing-to-the-wrong-expensive-model-heres-whats-happening-341g</link>
      <guid>https://forem.com/robinbanner/cline-keeps-routing-to-the-wrong-expensive-model-heres-whats-happening-341g</guid>
      <description>&lt;p&gt;If you use Cline with a non-Anthropic provider and notice it ignoring your model selection — you're not imagining it.&lt;/p&gt;

&lt;p&gt;There's a known issue (currently open in the Cline repo) where the CLI path doesn't fully hydrate &lt;code&gt;modelInfo&lt;/code&gt; for third-party providers, causing it to fall back to &lt;code&gt;anthropic/claude-3-7-sonnet-latest&lt;/code&gt; regardless of what you configured. The UI shows your selected model. The API calls use something else.&lt;/p&gt;

&lt;p&gt;This matters because at $0.55/call for Opus-class models, a misconfigured router silently burns money on tasks that could cost $0.006.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;Cline's model handler has two paths: VS Code extension (full model info hydration) and CLI (partial). For providers like Requesty, the CLI path sets &lt;code&gt;modelId&lt;/code&gt; but skips fetching &lt;code&gt;modelInfo&lt;/code&gt;. The handler then falls back to its default rather than trusting the explicit &lt;code&gt;modelId&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It's a known architectural inconsistency — the fix requires the CLI to either fetch model info or relax the requirement that both &lt;code&gt;modelId&lt;/code&gt; and &lt;code&gt;modelInfo&lt;/code&gt; must be present before respecting a custom model selection.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Cost Problem
&lt;/h2&gt;

&lt;p&gt;The deeper issue isn't the bug — it's the architecture that makes it possible. Most coding tools let you pick one model and route everything to it. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What does this variable do?" → Opus 4.6 → $0.55&lt;/li&gt;
&lt;li&gt;"Write a commit message" → Opus 4.6 → $0.55&lt;/li&gt;
&lt;li&gt;"Summarise this function" → Opus 4.6 → $0.55&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 60-70% of typical coding tasks, you're paying Opus prices for work that Gemini Flash handles at the same quality level for $0.006.&lt;/p&gt;

&lt;p&gt;I tracked my own usage for a month. 64% of my API calls were tasks where the cheapest capable model scored within 5% of Opus on output quality. The remaining 36% genuinely benefited from a better model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Smart Routing Looks Like
&lt;/h2&gt;

&lt;p&gt;The fix for the Cline CLI bug will let you use your configured provider correctly. But that still leaves the underlying problem: you're choosing one model for everything.&lt;/p&gt;

&lt;p&gt;The pattern that actually works: classify each request by task complexity, route simple tasks to cheap fast models, reserve expensive models for the work that actually needs them.&lt;/p&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Commit messages, variable lookups, short explanations → &lt;strong&gt;$0.006/call&lt;/strong&gt; (Gemini Flash class)&lt;/li&gt;
&lt;li&gt;Code generation, debugging, refactoring → &lt;strong&gt;$0.08-0.10/call&lt;/strong&gt; (Gemini Pro class)&lt;/li&gt;
&lt;li&gt;Architecture decisions, complex multi-file reasoning → &lt;strong&gt;$0.55/call&lt;/strong&gt; (Opus class)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting This Up in Cline Today
&lt;/h2&gt;

&lt;p&gt;If you want smart routing to work in Cline without waiting for the model-selection bug to be fixed — or if you want automatic task-based routing rather than manually managing model configs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set Cline's API provider to &lt;strong&gt;"OpenAI Compatible"&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Set API endpoint to &lt;code&gt;https://www.komilion.com/api/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use one of these as your model:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;neo-mode/frugal&lt;/code&gt; — auto-routes simple tasks to cheapest capable model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;neo-mode/balanced&lt;/code&gt; — good for most coding work&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;neo-mode/premium&lt;/code&gt; — council mode for architecture decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This bypasses the Requesty-specific &lt;code&gt;modelInfo&lt;/code&gt; issue entirely because it uses standard OpenAI model IDs, and it adds automatic task routing on top.&lt;/p&gt;

&lt;p&gt;Every response includes which model was actually used and the exact cost in the &lt;code&gt;komilion&lt;/code&gt; field of the response body — so you can verify the routing is working.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark
&lt;/h2&gt;

&lt;p&gt;I ran 40 API calls across 10 real developer tasks and published every response unedited: &lt;a href="https://komilion.com/compare-v2" rel="noopener noreferrer"&gt;komilion.com/compare-v2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The finding that surprised me most: for summarisation, code explanation, and simple Q&amp;amp;A, the quality gap between Frugal and Opus 4.6 collapses to near zero. The tasks where Opus genuinely wins are architecture planning, complex multi-file reasoning, and ambiguous requirements — which is exactly when Premium kicks in automatically.&lt;/p&gt;




&lt;p&gt;The Cline CLI bug will get fixed. The one-model-for-everything habit is harder to fix without a router.&lt;/p&gt;

&lt;p&gt;$5 free trial at &lt;a href="https://komilion.com/auth/signup" rel="noopener noreferrer"&gt;komilion.com/auth/signup&lt;/a&gt; — no card required. Change one URL, see which model handled each request and what it cost.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cline</category>
      <category>apidevelopment</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>How to Run an AI Benchmark That Doesn't Lie to You</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Sat, 21 Feb 2026 17:01:40 +0000</pubDate>
      <link>https://forem.com/robinbanner/how-to-run-an-ai-benchmark-that-doesnt-lie-to-you-5fhe</link>
      <guid>https://forem.com/robinbanner/how-to-run-an-ai-benchmark-that-doesnt-lie-to-you-5fhe</guid>
      <description>&lt;p&gt;We're about to publish a comparison page that benchmarks 4 AI setups against 10 real developer tasks. Before we do, here's every design decision we made to make sure the results can't be gamed — including by us.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with most AI benchmarks
&lt;/h2&gt;

&lt;p&gt;Most "AI comparison" content has at least one of these problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cherry-picked prompts&lt;/strong&gt; — tasks chosen because one model happens to shine on them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proprietary scoring&lt;/strong&gt; — a company scoring its own outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No raw outputs&lt;/strong&gt; — you see scores but not what the models actually said&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic data&lt;/strong&gt; — results that change over time, making past claims unverifiable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrong comparison baseline&lt;/strong&gt; — comparing a fine-tuned model against a base model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our compare page will have all of these problems if we're not careful. Here's what we're doing about each.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decision 1: 10 tasks, chosen before we ran anything
&lt;/h2&gt;

&lt;p&gt;The 10 tasks were finalized before a single API call was made:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Python function with unit tests&lt;/li&gt;
&lt;li&gt;Debug a real bug (provided)&lt;/li&gt;
&lt;li&gt;Explain async/await in JavaScript&lt;/li&gt;
&lt;li&gt;Write unit tests for a given function&lt;/li&gt;
&lt;li&gt;Refactor a function for readability&lt;/li&gt;
&lt;li&gt;Summarize a 500-word document&lt;/li&gt;
&lt;li&gt;Write a git commit message for a real diff&lt;/li&gt;
&lt;li&gt;Optimize a slow SQL query&lt;/li&gt;
&lt;li&gt;Architecture recommendation for a real problem&lt;/li&gt;
&lt;li&gt;Design a REST API for given requirements&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We didn't run any of these and then swap in different prompts after seeing bad results. The prompts are locked. If the output for task 6 is embarrassing for one tier, we show it anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; The temptation to "just swap one prompt that didn't work well" is how benchmarks quietly become marketing. We locked the prompts first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decision 2: Static human scoring, not AI judging AI
&lt;/h2&gt;

&lt;p&gt;Each output is scored on 2-3 dimensions by us, once, and locked in with a date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We considered dynamic scoring&lt;/strong&gt; — running a separate model (like Gemini Pro) on each page load to score outputs. It's technically impressive. We didn't do it because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI scoring AI is circular.&lt;/strong&gt; The model doing the scoring has its own biases. A Gemini-scored benchmark will favor Gemini. A Claude-scored benchmark favors Claude.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It hides the scoring logic.&lt;/strong&gt; If a model scores itself 4.8/5 and we don't show the scoring prompt, you can't verify it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It adds noise.&lt;/strong&gt; Scores change between page loads. A snapshot benchmark should be a snapshot.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Static human scoring means you can disagree with us. The score is ours, dated, signed. If you think we scored Task 3 wrong, tell us.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decision 3: Every full output is visible
&lt;/h2&gt;

&lt;p&gt;Most comparison pages show a summary table or a curated excerpt. We're showing every full response, unedited, with a copy button and a JSON download.&lt;/p&gt;

&lt;p&gt;This is the only way a benchmark is honest. If Premium's architecture recommendation is 18,000 words of genuinely useful content, show that. If Frugal's commit message is "Add feature" with no context, show that too.&lt;/p&gt;

&lt;p&gt;The response that looks bad is as important as the one that looks good.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decision 4: The competitor column is a direct API call, not ours
&lt;/h2&gt;

&lt;p&gt;Our "Competitor: Opus direct" column calls Claude Opus 4.6 directly via the Anthropic SDK — not through our own endpoint.&lt;/p&gt;

&lt;p&gt;This matters because: if we route the competitor column through Komilion, any routing overhead, prompt modification, or API quirk affects the competitor result. The baseline needs to be genuinely independent to be meaningful.&lt;/p&gt;

&lt;p&gt;Practically: Niobe runs a separate script for this column with no Komilion code in the path.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decision 5: Benchmarks are a snapshot, not a permanent claim
&lt;/h2&gt;

&lt;p&gt;The outputs are dated. They'll get stale as models improve. We'll re-run and update — but we won't quietly update old results. Old results stay visible with their dates.&lt;/p&gt;

&lt;p&gt;This is the "no retroactive edits" principle. A benchmark that silently improves over time is marketing. A benchmark that ages visibly is honest.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we're actually testing
&lt;/h2&gt;

&lt;p&gt;We're running 4 setups against the same 10 prompts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frugal tier&lt;/strong&gt; — cheapest capable model for each task (~$0.006/call)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balanced tier&lt;/strong&gt; — recommended tier, balance of cost and quality (~$0.10/call)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium tier (council mode)&lt;/strong&gt; — multi-model orchestration, our claim is it beats single-model Opus on complex tasks (~$0.55+/call)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus 4.6 direct&lt;/strong&gt; — the gold standard comparison, called via Anthropic's API with no routing layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result we're most curious about ourselves: does council mode actually beat direct Opus on the architecture and API design tasks? We don't know yet. The benchmark will tell us, and we'll publish whatever it says.&lt;/p&gt;




&lt;h2&gt;
  
  
  When it ships
&lt;/h2&gt;

&lt;p&gt;Compare page v2 goes live once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The benchmark data is in (40 API calls, a few hours of compute)&lt;/li&gt;
&lt;li&gt;Scores are written and reviewed&lt;/li&gt;
&lt;li&gt;The page passes QA&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We're targeting this week.&lt;/p&gt;

&lt;p&gt;If you want to see the outputs the moment it's live: &lt;a href="https://www.komilion.com/compare?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=benchmark-methodology-feb26" rel="noopener noreferrer"&gt;komilion.com/compare&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>benchmarks</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Your AI Agent Is Probably Costing 10x More Than It Should</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Sat, 21 Feb 2026 15:24:19 +0000</pubDate>
      <link>https://forem.com/robinbanner/your-ai-agent-is-probably-costing-10x-more-than-it-should-2423</link>
      <guid>https://forem.com/robinbanner/your-ai-agent-is-probably-costing-10x-more-than-it-should-2423</guid>
      <description>&lt;p&gt;AI agents make a lot of API calls. Most of them are cheap tasks disguised as expensive ones.&lt;/p&gt;

&lt;p&gt;Here's the breakdown and the fix.&lt;/p&gt;




&lt;h2&gt;
  
  
  What an agent session actually costs
&lt;/h2&gt;

&lt;p&gt;A typical agent loop for "add error handling to this function":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read system prompt + context → $0.04&lt;/li&gt;
&lt;li&gt;Parse the task → $0.02&lt;/li&gt;
&lt;li&gt;Read the target file → $0.01&lt;/li&gt;
&lt;li&gt;Plan the changes → $0.04&lt;/li&gt;
&lt;li&gt;Write the edit → $0.08&lt;/li&gt;
&lt;li&gt;Verify the output → $0.04&lt;/li&gt;
&lt;li&gt;Report back → $0.01&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Total: ~$0.24 at Opus pricing for one small task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Do this 20 times in a day: $4.80. Scale to a team of 5: $24/day, $720/month, just for one agent on small tasks.&lt;/p&gt;

&lt;p&gt;The math gets worse for agents with tool use, multi-step reasoning, and retrieval loops. GPT-5.2 or Opus on every step of a 10-step agent workflow = $2-5 per workflow execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: most agent calls aren't complex
&lt;/h2&gt;

&lt;p&gt;Look at what an agent actually does step-by-step:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What it requires&lt;/th&gt;
&lt;th&gt;Cheapest capable model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Parse user intent&lt;/td&gt;
&lt;td&gt;Basic NLP&lt;/td&gt;
&lt;td&gt;Gemini Flash ($0.0001/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read a file&lt;/td&gt;
&lt;td&gt;No reasoning needed&lt;/td&gt;
&lt;td&gt;Gemini Flash ($0.0001/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check if task is done&lt;/td&gt;
&lt;td&gt;Simple comparison&lt;/td&gt;
&lt;td&gt;Gemini Flash ($0.0001/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write a function&lt;/td&gt;
&lt;td&gt;Code generation&lt;/td&gt;
&lt;td&gt;Sonnet-class ($0.01/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debug complex logic&lt;/td&gt;
&lt;td&gt;Deep reasoning&lt;/td&gt;
&lt;td&gt;Opus-class ($0.08/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan a multi-step refactor&lt;/td&gt;
&lt;td&gt;Architecture thinking&lt;/td&gt;
&lt;td&gt;Opus-class ($0.08/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confirm with user&lt;/td&gt;
&lt;td&gt;Conversation&lt;/td&gt;
&lt;td&gt;Gemini Flash ($0.0001/call)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The expensive model (Opus) is right for 2-3 steps. The cheap model handles 4-5 steps fine. But agents typically use one model for everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix: route by step type
&lt;/h2&gt;

&lt;p&gt;Two approaches depending on how your agent is built.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Manual routing in your agent code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Route agent steps to appropriate models.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Simple steps: fast, cheap
&lt;/span&gt;    &lt;span class="n"&gt;simple_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check_done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Complex steps: need capability
&lt;/span&gt;    &lt;span class="n"&gt;complex_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refactor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;simple_steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;step_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;complex_steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemini-3-pro-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# default
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost reduction:&lt;/strong&gt; ~60-70% depending on your step distribution.&lt;br&gt;
&lt;strong&gt;Downside:&lt;/strong&gt; You maintain the routing logic. Every new step type needs a decision.&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 2: Let the routing layer decide
&lt;/h3&gt;

&lt;p&gt;Point your agent at an auto-routing endpoint. The router classifies each call and picks the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ck_your_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Every step uses the same model string
# The router reads the prompt and picks the right model
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# classifier picks frugal/balanced/premium
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# See what was actually used:
# response["komilion"]["neo"]["brainModel"]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost reduction:&lt;/strong&gt; Similar to manual routing (60-80%), but automatic.&lt;br&gt;
&lt;strong&gt;Downside:&lt;/strong&gt; You don't control which exact model runs. You see it in the response metadata but can't predict it in advance.&lt;/p&gt;


&lt;h2&gt;
  
  
  Override for critical steps
&lt;/h2&gt;

&lt;p&gt;For steps where quality absolutely matters — final output, user-facing decisions — override to premium:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Most steps: auto-route
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Critical final output: pin to Opus
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo-mode/premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# always Opus 4.6
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This hybrid approach uses cheap models for scaffolding and reserves Opus for output that users actually see.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real numbers for a 10-step agent
&lt;/h2&gt;

&lt;p&gt;Agent workflow: parse → plan → read files (×3) → implement (×2) → test → review → output&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost/run&lt;/th&gt;
&lt;th&gt;100 runs/month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Opus everything&lt;/td&gt;
&lt;td&gt;$2.40&lt;/td&gt;
&lt;td&gt;$240&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual routing&lt;/td&gt;
&lt;td&gt;$0.72&lt;/td&gt;
&lt;td&gt;$72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-routing (balanced)&lt;/td&gt;
&lt;td&gt;$0.58&lt;/td&gt;
&lt;td&gt;$58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flash everything&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;td&gt;$3 (quality degrades)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 240x difference between Opus-everything and Flash-everything is real — but quality degrades on Flash for the hard steps. The sweet spot is routing: $58-72/month vs $240, with Opus still handling the complex steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to watch out for
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Context window bleed.&lt;/strong&gt; Agents often append conversation history to every call. A 10-step agent where each step adds 1K tokens to the context = 55K total input tokens on the final step. Your routing decision about step complexity matters less than your context management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool call overhead.&lt;/strong&gt; Every tool call is a round-trip API call. An agent that calls 5 tools per step at Opus pricing = 5× the cost per step. Use cheap models for tool parsing, expensive models for reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retry loops.&lt;/strong&gt; If an agent retries a failed step 3 times, you've paid 3× for one step. Add exponential backoff AND downgrade the model on retry (if it failed once, trying a different model is more useful than the same expensive model again).&lt;/p&gt;




&lt;h2&gt;
  
  
  The two-line change
&lt;/h2&gt;

&lt;p&gt;If you're using any OpenAI-compatible agent framework (LangChain, AutoGen, CrewAI, custom), the change is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before:
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After:
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.komilion.com/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ck_your_komilion_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same code. Different cost profile. The routing happens transparently.&lt;/p&gt;

&lt;p&gt;$5 free at &lt;a href="https://www.komilion.com?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=agent-costs-feb26" rel="noopener noreferrer"&gt;komilion.com&lt;/a&gt; — no card, start testing immediately.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>devtools</category>
    </item>
    <item>
      <title>How to Run Cline for $10/Month (Instead of $60+)</title>
      <dc:creator>Robin</dc:creator>
      <pubDate>Sat, 21 Feb 2026 15:23:33 +0000</pubDate>
      <link>https://forem.com/robinbanner/how-to-run-cline-for-10month-instead-of-60-1n2k</link>
      <guid>https://forem.com/robinbanner/how-to-run-cline-for-10month-instead-of-60-1n2k</guid>
      <description>&lt;p&gt;Cline is one of the best AI coding assistants available. It's also easy to accidentally spend $60-200/month on it if you're not paying attention.&lt;/p&gt;

&lt;p&gt;Here's how to get your Cline bill under $10/month without gutting the quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Cline gets expensive
&lt;/h2&gt;

&lt;p&gt;Cline is an agentic tool. For every task you give it, it makes multiple API calls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial understanding (reading files, asking clarifying questions)&lt;/li&gt;
&lt;li&gt;Planning the approach&lt;/li&gt;
&lt;li&gt;Making edits (one API call per file, sometimes more)&lt;/li&gt;
&lt;li&gt;Verifying the changes&lt;/li&gt;
&lt;li&gt;Handling errors and retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A "write me a function" request might trigger 8-15 API calls. At Opus pricing, that's $4-8 for one task.&lt;/p&gt;

&lt;p&gt;Do 20 coding tasks a day? That's $80-160/day at full Opus. Obviously nobody runs it that hard, but even 5-6 complex tasks/day adds up fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  The model math
&lt;/h2&gt;

&lt;p&gt;The default for heavy Cline users is often Opus or Sonnet. Here's what each actually costs per Cline session:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical session (30 API calls, ~3K tokens each = 90K tokens):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input price&lt;/th&gt;
&lt;th&gt;Output price&lt;/th&gt;
&lt;th&gt;Session cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;$15/M&lt;/td&gt;
&lt;td&gt;$75/M&lt;/td&gt;
&lt;td&gt;~$4.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$3/M&lt;/td&gt;
&lt;td&gt;$15/M&lt;/td&gt;
&lt;td&gt;~$0.81&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Pro&lt;/td&gt;
&lt;td&gt;$3.5/M&lt;/td&gt;
&lt;td&gt;$10.5/M&lt;/td&gt;
&lt;td&gt;~$0.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;$0.075/M&lt;/td&gt;
&lt;td&gt;$0.30/M&lt;/td&gt;
&lt;td&gt;~$0.014&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The problem: most of those 30 calls don't need Opus. File reads, task confirmations, simple completions — these work on cheap models. Only complex reasoning and hard edits actually need the top model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strategy 1: Manual model switching
&lt;/h2&gt;

&lt;p&gt;Cheapest with no external tool. In Cline settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default to Sonnet 4.6 (&lt;code&gt;claude-sonnet-4-6&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Switch to Opus manually for hard sessions&lt;/li&gt;
&lt;li&gt;Use Gemini Flash for quick Q&amp;amp;A&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works if you're disciplined. Most people aren't — they set one model and forget it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realistic monthly cost:&lt;/strong&gt; $15-25 if you actually switch. $50-80 if you forget.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strategy 2: Automatic routing (set-and-forget)
&lt;/h2&gt;

&lt;p&gt;Point Cline at a routing layer that automatically picks the right model per call. You never change your config.&lt;/p&gt;

&lt;p&gt;In Cline's settings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;API Provider → OpenAI Compatible&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Set Base URL to &lt;code&gt;https://www.komilion.com/api/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set Model to &lt;code&gt;neo-mode/balanced&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Get a key at komilion.com ($5 free to start)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What happens next: Cline sends each API call to the router. Simple file reads, quick questions, and confirmations route to Gemini Flash-class models (~$0.006/call). Complex edits and hard problems route to Sonnet or Opus class (~$0.10-0.55/call). You see the actual model used in &lt;code&gt;data["komilion"]["neo"]["brainModel"]&lt;/code&gt; in each response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realistic monthly cost:&lt;/strong&gt; $6-18 depending on how many complex tasks you run.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real numbers from a mixed session
&lt;/h2&gt;

&lt;p&gt;Say you run 100 Cline API calls on a typical day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~60 are simple (file reads, task confirmations, short questions)&lt;/li&gt;
&lt;li&gt;~30 are moderate (standard code edits, debugging)&lt;/li&gt;
&lt;li&gt;~10 are complex (architecture decisions, hard bugs, multi-file refactors)&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Simple (60)&lt;/th&gt;
&lt;th&gt;Moderate (30)&lt;/th&gt;
&lt;th&gt;Complex (10)&lt;/th&gt;
&lt;th&gt;Day total&lt;/th&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Opus everywhere&lt;/td&gt;
&lt;td&gt;60 × $0.55 = $33&lt;/td&gt;
&lt;td&gt;30 × $0.55 = $16.50&lt;/td&gt;
&lt;td&gt;10 × $0.55 = $5.50&lt;/td&gt;
&lt;td&gt;$55&lt;/td&gt;
&lt;td&gt;$1,650&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smart routing&lt;/td&gt;
&lt;td&gt;60 × $0.006 = $0.36&lt;/td&gt;
&lt;td&gt;30 × $0.10 = $3.00&lt;/td&gt;
&lt;td&gt;10 × $0.55 = $5.50&lt;/td&gt;
&lt;td&gt;$8.86&lt;/td&gt;
&lt;td&gt;$265&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "Opus everywhere" row is what most people actually run — one model set, never changed. Smart routing gets you the same Opus quality on complex calls, cheap models everywhere else.&lt;/p&gt;

&lt;p&gt;"I'll manually switch" is the classic plan that lasts about two days before you forget to switch back.&lt;br&gt;
&lt;em&gt;(Padme: replaced scratchpad math with clean two-row table aligned with Article 10 numbers. Per-call avg for Opus = $0.55 at typical Cline context depth. Smart routing row confirmed correct: $8.86/day matches Article 10 exactly.)&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Getting the $10/month target
&lt;/h2&gt;

&lt;p&gt;$10/month = ~$0.33/day.&lt;/p&gt;

&lt;p&gt;At routing rates, that's about 37 moderate API calls per day, or 1,650 frugal calls per day (file reads, quick questions).&lt;/p&gt;

&lt;p&gt;Realistic: a light Cline user running 2-3 small tasks/day with smart routing hits $5-15/month. A heavy user doing 10+ complex tasks daily will be higher.&lt;/p&gt;

&lt;p&gt;The $10 target is achievable if you're using Cline primarily for moderate complexity work with frugal for the auxiliary calls.&lt;/p&gt;


&lt;h2&gt;
  
  
  One Cline-specific setting worth knowing
&lt;/h2&gt;

&lt;p&gt;Cline has a "system prompt" override. If you add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keep responses concise. Confirm before making large changes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...you reduce the average token count per response, which directly reduces cost. Verbose AI responses = more output tokens = higher bill.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick setup
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Get a key at &lt;a href="https://www.komilion.com?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=cline-budget-feb26" rel="noopener noreferrer"&gt;komilion.com&lt;/a&gt; — $5 free, no card&lt;/li&gt;
&lt;li&gt;Cline settings → API Provider → OpenAI Compatible&lt;/li&gt;
&lt;li&gt;Base URL: &lt;code&gt;https://www.komilion.com/api/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Model: &lt;code&gt;neo-mode/balanced&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Keep &lt;code&gt;neo-mode/premium&lt;/code&gt; in a second profile for when you specifically need Opus&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your bill changes immediately.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cline</category>
      <category>vscode</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
