<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alex Mercer</title>
    <description>The latest articles on Forem by Alex Mercer (@alexmercerdev).</description>
    <link>https://forem.com/alexmercerdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3852189%2Feb6a19f8-4722-4df6-ae25-8b9197a5e1c2.png</url>
      <title>Forem: Alex Mercer</title>
      <link>https://forem.com/alexmercerdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alexmercerdev"/>
    <language>en</language>
    <item>
      <title>AI API cost math: 5 numbers to check before choosing a model</title>
      <dc:creator>Alex Mercer</dc:creator>
      <pubDate>Wed, 13 May 2026 21:09:33 +0000</pubDate>
      <link>https://forem.com/alexmercerdev/ai-api-cost-math-5-numbers-to-check-before-choosing-a-model-4i3j</link>
      <guid>https://forem.com/alexmercerdev/ai-api-cost-math-5-numbers-to-check-before-choosing-a-model-4i3j</guid>
      <description>&lt;p&gt;Most teams compare AI APIs by model quality first and price second.&lt;/p&gt;

&lt;p&gt;That is backwards once you have real usage.&lt;/p&gt;

&lt;p&gt;The line item that matters is usually not "price per token" by itself. It is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;monthly cost = requests
  × (avg input tokens × input price per token)
  + (avg output tokens × output price per token)
  + retries
  - cache savings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are the five numbers I check before choosing a model.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Input/output token ratio
&lt;/h2&gt;

&lt;p&gt;Input and output are priced differently on most APIs.&lt;/p&gt;

&lt;p&gt;For chatbots, support agents, code review tools, and report generators, output can dominate the bill because the model writes much more than the user sends.&lt;/p&gt;

&lt;p&gt;A cheap-input model can still be expensive if its output price is high and your responses are long.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Cache hit rate
&lt;/h2&gt;

&lt;p&gt;If your app repeatedly sends the same system prompt, tool schema, policies, or long context, cached input pricing can change the economics.&lt;/p&gt;

&lt;p&gt;This matters most for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding assistants&lt;/li&gt;
&lt;li&gt;support bots with large policy context&lt;/li&gt;
&lt;li&gt;RAG apps with repeated instructions&lt;/li&gt;
&lt;li&gt;internal agents with long tool definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ignore caching, you may overestimate the monthly cost of larger-context models.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Retry rate
&lt;/h2&gt;

&lt;p&gt;The cheapest API is not always the cheapest workflow.&lt;/p&gt;

&lt;p&gt;If a low-cost model needs retries, validation cleanup, or a second "fix this JSON" pass, the effective cost goes up fast.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model A: $0.20 per task, 1 pass
model B: $0.08 per task, but 3 passes often needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model B looks cheaper on paper and loses in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Latency cost
&lt;/h2&gt;

&lt;p&gt;Latency has a money cost even if the API invoice does not show it.&lt;/p&gt;

&lt;p&gt;Slow models can reduce conversion, increase queue time, or force you to run more parallel workers.&lt;/p&gt;

&lt;p&gt;For user-facing flows, I usually separate models into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;realtime/chat UX&lt;/li&gt;
&lt;li&gt;background jobs&lt;/li&gt;
&lt;li&gt;batch/offline processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those should not always use the same model.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Monthly volume bands
&lt;/h2&gt;

&lt;p&gt;At low volume, a more expensive model might be fine if it saves engineering time.&lt;/p&gt;

&lt;p&gt;At high volume, tiny per-token differences matter.&lt;/p&gt;

&lt;p&gt;A difference of $0.50 per million tokens is irrelevant at 10M tokens/month. It is very relevant at 2B tokens/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick checklist
&lt;/h2&gt;

&lt;p&gt;Before switching models, estimate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;requests/month
avg input tokens/request
avg output tokens/request
cacheable input %
retry/failure rate
latency requirement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then compare models by workload, not by headline benchmark score.&lt;/p&gt;

&lt;p&gt;I keep a daily-updated pricing table and calculator here if you want current $/1M token numbers across providers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.aipricing.guru/pricing/" rel="noopener noreferrer"&gt;https://www.aipricing.guru/pricing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the moment I’m tracking 89 models across 11 providers, with separate input, cached input, and output pricing where available.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Cheapest AI APIs in 2026: Every Model Ranked by Cost</title>
      <dc:creator>Alex Mercer</dc:creator>
      <pubDate>Mon, 30 Mar 2026 19:09:54 +0000</pubDate>
      <link>https://forem.com/alexmercerdev/cheapest-ai-apis-in-2026-every-model-ranked-by-cost-5cd1</link>
      <guid>https://forem.com/alexmercerdev/cheapest-ai-apis-in-2026-every-model-ranked-by-cost-5cd1</guid>
      <description>&lt;p&gt;Looking for the cheapest AI API? I got tired of checking 7 different pricing pages every time I needed to pick a model, so I built &lt;a href="https://www.aipricing.guru" rel="noopener noreferrer"&gt;AI Pricing Guru&lt;/a&gt; — a free comparison tool that tracks token costs across all major providers, updated daily.&lt;/p&gt;

&lt;p&gt;Here's the current ranking as of March 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cheapest AI Models: Input Price Ranking
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input / 1M&lt;/th&gt;
&lt;th&gt;Output / 1M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;GPT-4.1 nano&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mistral Small&lt;/td&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Llama 4 Scout&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;GPT-4o mini&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Llama 4 Maverick&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;GPT-5.4 nano&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Grok 4.1 Fast&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;GPT-5.4 mini&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash-Lite&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.42&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Best Value by Use Case
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Monthly Cost (10M tokens)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Classification/routing&lt;/td&gt;
&lt;td&gt;GPT-4.1 nano&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chatbots&lt;/td&gt;
&lt;td&gt;Mistral Small&lt;/td&gt;
&lt;td&gt;$4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code generation&lt;/td&gt;
&lt;td&gt;Grok 4.1 Fast&lt;/td&gt;
&lt;td&gt;$7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document analysis&lt;/td&gt;
&lt;td&gt;Llama 4 Scout&lt;/td&gt;
&lt;td&gt;$3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex reasoning&lt;/td&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;$7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;$28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Hidden Savings: Cached Input Pricing
&lt;/h2&gt;

&lt;p&gt;Most providers offer 80-90% discounts on repeated prompts (system prompts, shared context). If your app reuses the same context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt;: 90% off (e.g., $2.50 → $0.25)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic&lt;/strong&gt;: 90% off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek&lt;/strong&gt;: 90% off ($0.28 → $0.028)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Design stable system prompts and you'll cut costs dramatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Save Even More
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Batch API&lt;/strong&gt; — OpenAI offers 50% off for async processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Right-size your model&lt;/strong&gt; — don't use GPT-5.4 for tasks GPT-4.1 nano handles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor usage&lt;/strong&gt; — use a &lt;a href="https://www.aipricing.guru/calculator" rel="noopener noreferrer"&gt;token calculator&lt;/a&gt; to estimate before committing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache aggressively&lt;/strong&gt; — same system prompt = cached pricing&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Full Comparison
&lt;/h2&gt;

&lt;p&gt;I track 33 models across 7 providers with daily updates. Check the full comparison:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;a href="https://www.aipricing.guru/pricing" rel="noopener noreferrer"&gt;Full pricing table&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🧮 &lt;a href="https://www.aipricing.guru/calculator" rel="noopener noreferrer"&gt;Token cost calculator&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All data is free, no signup required. I update prices daily by checking each provider's official docs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built this because I was wasting time comparing pricing pages manually. Hope it helps someone else too.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>pricing</category>
    </item>
  </channel>
</rss>
