<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Styx 7</title>
    <description>The latest articles on Forem by Styx 7 (@timmx7).</description>
    <link>https://forem.com/timmx7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3816864%2Ff7e45055-6cfa-43fb-9685-01344e16b3e5.png</url>
      <title>Forem: Styx 7</title>
      <link>https://forem.com/timmx7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/timmx7"/>
    <language>en</language>
    <item>
      <title>I built an AI gateway that picks the right model for every request</title>
      <dc:creator>Styx 7</dc:creator>
      <pubDate>Tue, 10 Mar 2026 12:36:55 +0000</pubDate>
      <link>https://forem.com/timmx7/i-built-an-ai-gateway-that-picks-the-right-model-for-every-request-36h</link>
      <guid>https://forem.com/timmx7/i-built-an-ai-gateway-that-picks-the-right-model-for-every-request-36h</guid>
      <description>&lt;p&gt;Every AI app has the same problem: you hardcode model: "gpt-4o" and pay frontier prices for "what's the weather?" questions.&lt;br&gt;
I built Styx to fix this. It's an open-source AI gateway where you send "model": "styx:auto" and it picks the right model automatically.&lt;/p&gt;
&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;When your app sends a request to Styx with model: "styx:auto", a 9-signal classifier scores the prompt in real-time:&lt;br&gt;
The 9 signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token count&lt;/strong&gt; — Short vs long prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code presence&lt;/strong&gt; — Code blocks, function/class/def keywords&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning patterns&lt;/strong&gt; — "step by step", "analyze", "compare"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Math markers&lt;/strong&gt; — "prove", "equation", "calculate"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical depth&lt;/strong&gt; — "refactor", "architecture", "optimize"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative scope&lt;/strong&gt; — "write a story", "design a system"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation depth&lt;/strong&gt; — Multi-turn conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal hints&lt;/strong&gt; — References to images, documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language detection&lt;/strong&gt; — Non-English content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Score 0-29 → cheap model (gpt-4o-mini, $0.15/1M)&lt;br&gt;
Score 30-59 → balanced model (gpt-4o, $2.50/1M)&lt;br&gt;
Score 60+ → frontier model (gpt-5.4, $2.50/1M)&lt;/p&gt;

&lt;p&gt;The whole thing runs in Go, adds &amp;lt;1ms latency, and the response includes headers telling you exactly what happened:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;X&lt;/span&gt;-&lt;span class="n"&gt;Styx&lt;/span&gt;-&lt;span class="n"&gt;Auto&lt;/span&gt;-&lt;span class="n"&gt;Tier&lt;/span&gt;: &lt;span class="n"&gt;light&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;-&lt;span class="n"&gt;Styx&lt;/span&gt;-&lt;span class="n"&gt;Auto&lt;/span&gt;-&lt;span class="n"&gt;Score&lt;/span&gt;: &lt;span class="m"&gt;8&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;-&lt;span class="n"&gt;Styx&lt;/span&gt;-&lt;span class="n"&gt;Auto&lt;/span&gt;-&lt;span class="n"&gt;Model&lt;/span&gt;: &lt;span class="n"&gt;gpt&lt;/span&gt;-&lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;-&lt;span class="n"&gt;mini&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/timmx7/styx &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;styx
./setup.sh          &lt;span class="c"&gt;# interactive wizard, no .env editing&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"styx:auto","messages":[{"role":"user","content":"Hello"}]}'&lt;/span&gt;
&lt;span class="c"&gt;# → Routes to gpt-4o-mini (cheap, fast)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"styx:auto","messages":[{"role":"user","content":"Refactor this codebase to use async/await and add comprehensive error handling step by step"}]}'&lt;/span&gt;
&lt;span class="c"&gt;# → Routes to gpt-5.4 (frontier)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What else it does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;65+ models from OpenAI, Anthropic, Google, Mistral&lt;/li&gt;
&lt;li&gt;Auto-failover: OpenAI down? Routes to Anthropic automatically&lt;/li&gt;
&lt;li&gt;Dashboard: track every request, cost, latency&lt;/li&gt;
&lt;li&gt;BYOK: your keys, your data, self-hosted&lt;/li&gt;
&lt;li&gt;MCP-native: connect Claude Code or Cursor in one command&lt;/li&gt;
&lt;li&gt;Prices auto-refresh daily from OpenRouter's public API&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The real savings
&lt;/h2&gt;

&lt;p&gt;If 80% of your requests are simple (and they usually are), you're saving 90%+ on those by routing to cheap models. Only the 20% complex requests go to frontier. For a SaaS doing 100k requests/month, that's thousands of dollars saved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/timmx7/styx" rel="noopener noreferrer"&gt;GitHub: github.com/timmx7/styx&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Would love feedback on the classifier design — especially edge cases you'd want handled differently.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
