<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: BernardinoGM</title>
    <description>The latest articles on Forem by BernardinoGM (@bernardinogm).</description>
    <link>https://forem.com/bernardinogm</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843852%2F4a7906ee-5da1-4435-84ad-4467b54b6e98.png</url>
      <title>Forem: BernardinoGM</title>
      <link>https://forem.com/bernardinogm</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bernardinogm"/>
    <language>en</language>
    <item>
      <title>How I Cut AI API Costs by 45x With a Simple Routing Proxy</title>
      <dc:creator>BernardinoGM</dc:creator>
      <pubDate>Wed, 25 Mar 2026 23:43:50 +0000</pubDate>
      <link>https://forem.com/bernardinogm/how-i-cut-ai-api-costs-by-45x-with-a-simple-routing-proxy-6co</link>
      <guid>https://forem.com/bernardinogm/how-i-cut-ai-api-costs-by-45x-with-a-simple-routing-proxy-6co</guid>
      <description>&lt;p&gt;Last week I ran a simple test. I sent "Say hi" to two different AI APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude/OpenAI cost:&lt;/strong&gt; $0.000179&lt;br&gt;
&lt;strong&gt;Through my routing proxy:&lt;/strong&gt; $0.000004&lt;/p&gt;

&lt;p&gt;Same response quality. 45x price difference.&lt;/p&gt;
&lt;h2&gt;
  
  
  The insight
&lt;/h2&gt;

&lt;p&gt;I analyzed a month of API traffic from a production app. Here's what I found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;72% of requests&lt;/strong&gt; were simple tasks: classification, extraction, summarization, translation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;18% were medium:&lt;/strong&gt; multi-step analysis, moderate code generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10% were genuinely hard:&lt;/strong&gt; complex reasoning, system design, novel code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The simple tasks ran identically on DeepSeek V4 at $0.14/M tokens. We were paying Claude $15/M for the same work. That's a 100x markup on commodity tasks.&lt;/p&gt;
&lt;h2&gt;
  
  
  The solution
&lt;/h2&gt;

&lt;p&gt;I built an OpenAI-compatible proxy that classifies each request and routes to the cheapest capable model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost/M tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple (~70%)&lt;/td&gt;
&lt;td&gt;DeepSeek V4&lt;/td&gt;
&lt;td&gt;$0.14 / $0.28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium (~20%)&lt;/td&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;$0.55 / $2.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard (~10%)&lt;/td&gt;
&lt;td&gt;Claude Sonnet&lt;/td&gt;
&lt;td&gt;$3.00 / $15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The classifier itself uses DeepSeek (cost: ~$0.001 per classification). If a cheap model fails, it auto-fallbacks to the next tier.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;p&gt;Change one line. Everything else stays the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Before
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-your-openai-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-bridge-your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://ai-bridge-router-et30.onrender.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This code doesn't change at all
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this as positive or negative: I love this product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response includes routing metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"_bridge"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"complexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"simple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek V4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cost_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.000004&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"benchmark_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.000179&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The math at scale
&lt;/h2&gt;

&lt;p&gt;For a team spending $3,000/month on AI APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;70% simple tasks: $3K × 0.7 × ($0.21/$5.00) = &lt;strong&gt;$88&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;20% medium: $3K × 0.2 × ($1.37/$5.00) = &lt;strong&gt;$164&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;10% hard: $3K × 0.1 × ($9.00/$5.00) = &lt;strong&gt;$540&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total: ~$792/month&lt;/strong&gt; instead of $3,000&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a 74% reduction, or &lt;strong&gt;$26,496 saved per year.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Free tier available (50K tokens/day, no credit card):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ai-bridge-router-et30.onrender.com/signup" rel="noopener noreferrer"&gt;Get a free API key →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Or self-host: &lt;strong&gt;&lt;a href="https://github.com/BernardinoGM/ai-bridge-router" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical details
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; Python, FastAPI, httpx async, SQLite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming:&lt;/strong&gt; Full SSE support, converts Anthropic format to OpenAI format on the fly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classification:&lt;/strong&gt; LLM-based (DeepSeek classifies requests) with keyword fallback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback:&lt;/strong&gt; Auto-escalates if cheap model returns error&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire router is a single Python file. MIT licensed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're spending $500+/month on AI APIs, the routing proxy probably pays for itself in the first week. &lt;a href="https://ai-bridge-router-et30.onrender.com/signup" rel="noopener noreferrer"&gt;Try it free&lt;/a&gt; or &lt;a href="https://github.com/BernardinoGM/ai-bridge-router" rel="noopener noreferrer"&gt;self-host from GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>openai</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
