<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: TokenHub</title>
    <description>The latest articles on Forem by TokenHub (@tokenhub_dev).</description>
    <link>https://forem.com/tokenhub_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898502%2F6f13da76-8606-4490-b57e-067e230f3c22.png</url>
      <title>Forem: TokenHub</title>
      <link>https://forem.com/tokenhub_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tokenhub_dev"/>
    <language>en</language>
    <item>
      <title>Swap OpenAI for DeepSeek without rewriting a single line of code</title>
      <dc:creator>TokenHub</dc:creator>
      <pubDate>Mon, 27 Apr 2026 03:55:51 +0000</pubDate>
      <link>https://forem.com/tokenhub_dev/swap-openai-for-deepseek-without-rewriting-a-single-line-of-code-4lm3</link>
      <guid>https://forem.com/tokenhub_dev/swap-openai-for-deepseek-without-rewriting-a-single-line-of-code-4lm3</guid>
      <description>&lt;p&gt;Last month I added Claude to a project that was already using GPT-4o. Two SDKs, two error formats, two retry strategies. By the time I finished I had wrapped both in my own abstraction — a tiny LLM gateway, badly written, that I now had to maintain.&lt;/p&gt;

&lt;p&gt;Then I noticed something I should have noticed earlier: most of the new providers expose an &lt;strong&gt;OpenAI-compatible&lt;/strong&gt; endpoint. DeepSeek, Mistral, Together, Fireworks — they all speak the same wire format. You don't need a new SDK. You need a new &lt;code&gt;base_url&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This post is the 5-minute version of that realization, with the tradeoffs I learned the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "before" code
&lt;/h2&gt;

&lt;p&gt;Standard OpenAI Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this PR diff...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The "after" code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;th-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://jiatoken.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# gateway
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Same call, different model
&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;# DeepSeek-V3
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this PR diff...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Two lines changed. The rest of your code — streaming handlers, tool calls, retry logic — keeps working because the response shape is identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this works
&lt;/h2&gt;

&lt;p&gt;The OpenAI Python SDK is just a typed HTTP client. It POSTs JSON to &lt;code&gt;{base_url}/chat/completions&lt;/code&gt;. Anything that responds with the same JSON shape is, from the SDK's point of view, OpenAI.&lt;/p&gt;

&lt;p&gt;Most gateways take advantage of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek&lt;/strong&gt; ships its own OpenAI-compatible endpoint at &lt;code&gt;api.deepseek.com/v1&lt;/code&gt;. You can point the SDK there directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic&lt;/strong&gt; does &lt;strong&gt;not&lt;/strong&gt; — Claude has its own message format. You need a translator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt; has both: a native API and a Vertex-side OpenAI shim.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A multi-model gateway (LiteLLM, OpenRouter, TokenHub, your own) collapses these into one endpoint. One key, one base_url, every model behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually save
&lt;/h2&gt;

&lt;p&gt;For the workload I just migrated (~3M input tokens / 1M output per day, mostly summarization):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/1M&lt;/th&gt;
&lt;th&gt;Output $/1M&lt;/th&gt;
&lt;th&gt;Daily cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;2.50&lt;/td&gt;
&lt;td&gt;10.00&lt;/td&gt;
&lt;td&gt;$17.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;td&gt;$24.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-V3&lt;/td&gt;
&lt;td&gt;0.07&lt;/td&gt;
&lt;td&gt;0.28&lt;/td&gt;
&lt;td&gt;$0.49&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DeepSeek isn't a drop-in &lt;em&gt;quality&lt;/em&gt; replacement for everything — GPT-4o still wins on instruction following in my evals — but for the 80% of calls that are "summarize this", "extract these fields", "rewrite in tone X", it's fine and ~35× cheaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The annoying parts
&lt;/h2&gt;

&lt;p&gt;A few things don't carry over cleanly through OpenAI compatibility:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling JSON shape.&lt;/strong&gt; Most providers match it now, but older OSS models return tool calls inside the content string. Always test with your actual prompts before flipping production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision.&lt;/strong&gt; OpenAI uses &lt;code&gt;image_url&lt;/code&gt; parts; some providers want base64. A gateway should normalize this for you — verify before you assume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming with usage stats.&lt;/strong&gt; OpenAI added &lt;code&gt;stream_options={"include_usage": True}&lt;/code&gt; to get token counts on the final SSE chunk. Not every backend forwards this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits.&lt;/strong&gt; You're now subject to the gateway's RPM, which may be lower than direct provider limits.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When NOT to use a gateway
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You only ever call one provider. Direct SDK is one less moving part.&lt;/li&gt;
&lt;li&gt;You need provider-specific features (Anthropic's prompt caching, OpenAI's Realtime API, Gemini's long context). Gateways usually lag behind native features by weeks.&lt;/li&gt;
&lt;li&gt;You're in a regulated environment that requires data plane control. Most gateways are SaaS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else — especially side projects and prototypes where the model you "want" changes every two weeks — a gateway pays for itself in saved switching cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;  client = OpenAI(
      api_key="...",
&lt;span class="gi"&gt;+     base_url="https://your-gateway/v1",
&lt;/span&gt;  )
  client.chat.completions.create(
&lt;span class="gd"&gt;-     model="gpt-4o",
&lt;/span&gt;&lt;span class="gi"&gt;+     model="deepseek-chat",
&lt;/span&gt;      ...
  )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to skip running your own LiteLLM, &lt;a href="https://jiatoken.com" rel="noopener noreferrer"&gt;TokenHub&lt;/a&gt; hosts a pre-configured gateway with 40+ models behind one key. Otherwise, &lt;a href="https://github.com/BerriAI/litellm" rel="noopener noreferrer"&gt;LiteLLM&lt;/a&gt; self-hosted is the standard answer.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built an OpenAI-Compatible Gateway to 40+ AI Models (DeepSeek, MiniMax, Claude)</title>
      <dc:creator>TokenHub</dc:creator>
      <pubDate>Sun, 26 Apr 2026 08:31:18 +0000</pubDate>
      <link>https://forem.com/tokenhub_dev/i-built-an-openai-compatible-gateway-to-40-ai-models-deepseek-minimax-claude-2ifk</link>
      <guid>https://forem.com/tokenhub_dev/i-built-an-openai-compatible-gateway-to-40-ai-models-deepseek-minimax-claude-2ifk</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I was paying for 5+ different AI subscriptions: OpenAI, Anthropic, Google, etc. Each with separate API keys, billing dashboards, and SDK quirks.&lt;/p&gt;

&lt;p&gt;When DeepSeek-V3 dropped at ~$0.28 per million output tokens (vs GPT-4o at $10), I wanted to switch — but the friction of changing SDKs across multiple projects was a pain.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;TokenHub&lt;/strong&gt; — an OpenAI-compatible gateway that routes to 40+ AI models with a single API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;It's a drop-in replacement for the OpenAI SDK. Just change &lt;code&gt;base_url&lt;/code&gt; and &lt;code&gt;api_key&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-tokenhub-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://jiatoken.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use any of 40+ models — DeepSeek, MiniMax, Claude, GPT, Gemini, Llama, etc.
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-v3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain async/await in Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The same code works with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gpt-4o&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;claude-sonnet-4-6&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemini-2.5-pro&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deepseek-v3&lt;/code&gt; / &lt;code&gt;deepseek-r1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;minimax-text-01&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;llama-3.3-70b&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;...and more&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Pricing Comparison
&lt;/h2&gt;

&lt;p&gt;Per million tokens (input / output):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o mini&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek-V3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TokenHub&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek-R1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TokenHub&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.55&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MiniMax-Text-01&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TokenHub&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For high-volume workloads (RAG, agents, batch summarization), DeepSeek-V3 is &lt;strong&gt;~35x cheaper&lt;/strong&gt; than GPT-4o for output tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Which Model
&lt;/h2&gt;

&lt;p&gt;A quick mental model from my own usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cheap &amp;amp; good enough&lt;/strong&gt; → DeepSeek-V3 (most general tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning&lt;/strong&gt; → DeepSeek-R1 (CoT-style tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long context&lt;/strong&gt; → MiniMax-Text-01 (200K+ tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontier capability&lt;/strong&gt; → GPT-4o or Claude (still worth it for hard problems)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code&lt;/strong&gt; → Claude Sonnet 4.6 or DeepSeek-V3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The win is being able to A/B test across models without rewriting code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Open-Sourced the Routing Logic
&lt;/h2&gt;

&lt;p&gt;(Note: TokenHub itself is hosted, but the routing pattern is straightforward.)&lt;/p&gt;

&lt;p&gt;The hardest part wasn't the proxy — it was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Normalizing function-calling formats&lt;/strong&gt; across providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling streaming differences&lt;/strong&gt; (SSE format quirks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token counting&lt;/strong&gt; for accurate billing pre-request&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building something similar, the OpenAI spec is the de facto standard. Most providers either match it or have OpenAI-compatible endpoints already.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you're tired of juggling AI subscriptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;👉 &lt;a href="https://jiatoken.com" rel="noopener noreferrer"&gt;https://jiatoken.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Free credits to start&lt;/li&gt;
&lt;li&gt;Pay-as-you-go, no monthly commitment&lt;/li&gt;
&lt;li&gt;Compatible with OpenAI SDK out of the box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love feedback — especially on which models you'd want added, or pricing pain points.&lt;/p&gt;

&lt;p&gt;What's your current setup? Are you using a single provider or juggling multiple?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>llm</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
