AI API cost math: 5 numbers to check before choosing a model

Alex Mercer — Wed, 13 May 2026 21:09:33 +0000

Most teams compare AI APIs by model quality first and price second.

That is backwards once you have real usage.

The line item that matters is usually not "price per token" by itself. It is:

monthly cost = requests
  × (avg input tokens × input price per token)
  + (avg output tokens × output price per token)
  + retries
  - cache savings

Here are the five numbers I check before choosing a model.

1. Input/output token ratio

Input and output are priced differently on most APIs.

For chatbots, support agents, code review tools, and report generators, output can dominate the bill because the model writes much more than the user sends.

A cheap-input model can still be expensive if its output price is high and your responses are long.

2. Cache hit rate

If your app repeatedly sends the same system prompt, tool schema, policies, or long context, cached input pricing can change the economics.

This matters most for:

coding assistants
support bots with large policy context
RAG apps with repeated instructions
internal agents with long tool definitions

If you ignore caching, you may overestimate the monthly cost of larger-context models.

3. Retry rate

The cheapest API is not always the cheapest workflow.

If a low-cost model needs retries, validation cleanup, or a second "fix this JSON" pass, the effective cost goes up fast.

Example:

model A: $0.20 per task, 1 pass
model B: $0.08 per task, but 3 passes often needed

Model B looks cheaper on paper and loses in production.

4. Latency cost

Latency has a money cost even if the API invoice does not show it.

Slow models can reduce conversion, increase queue time, or force you to run more parallel workers.

For user-facing flows, I usually separate models into:

realtime/chat UX
background jobs
batch/offline processing

Those should not always use the same model.

5. Monthly volume bands

At low volume, a more expensive model might be fine if it saves engineering time.

At high volume, tiny per-token differences matter.

A difference of $0.50 per million tokens is irrelevant at 10M tokens/month. It is very relevant at 2B tokens/month.

Quick checklist

Before switching models, estimate:

requests/month
avg input tokens/request
avg output tokens/request
cacheable input %
retry/failure rate
latency requirement

Then compare models by workload, not by headline benchmark score.

I keep a daily-updated pricing table and calculator here if you want current $/1M token numbers across providers:

https://www.aipricing.guru/pricing/

At the moment I’m tracking 89 models across 11 providers, with separate input, cached input, and output pricing where available.

Cheapest AI APIs in 2026: Every Model Ranked by Cost

Alex Mercer — Mon, 30 Mar 2026 19:09:54 +0000

Looking for the cheapest AI API? I got tired of checking 7 different pricing pages every time I needed to pick a model, so I built AI Pricing Guru — a free comparison tool that tracks token costs across all major providers, updated daily.

Here's the current ranking as of March 2026.

Cheapest AI Models: Input Price Ranking

Rank	Model	Provider	Input / 1M	Output / 1M
1	GPT-4.1 nano	OpenAI	$0.10	$0.40
2	Mistral Small	Mistral	$0.10	$0.30
3	Llama 4 Scout	Meta	$0.15	$0.15
4	GPT-4o mini	OpenAI	$0.15	$0.60
5	Llama 4 Maverick	Meta	$0.20	$0.20
6	GPT-5.4 nano	OpenAI	$0.20	$1.25
7	Grok 4.1 Fast	xAI	$0.20	$0.50
8	GPT-5.4 mini	OpenAI	$0.25	$2.00
9	Gemini 2.5 Flash-Lite	Google	$0.25	$1.50
10	DeepSeek V3.2	DeepSeek	$0.28	$0.42

Best Value by Use Case

Use Case	Best Model	Monthly Cost (10M tokens)
Classification/routing	GPT-4.1 nano	$5
Chatbots	Mistral Small	$4
Code generation	Grok 4.1 Fast	$7
Document analysis	Llama 4 Scout	$3
Complex reasoning	DeepSeek V3.2	$7
Multimodal	Gemini 2.5 Flash	$28

The Hidden Savings: Cached Input Pricing

Most providers offer 80-90% discounts on repeated prompts (system prompts, shared context). If your app reuses the same context:

OpenAI: 90% off (e.g., $2.50 → $0.25)
Anthropic: 90% off
DeepSeek: 90% off ($0.28 → $0.028)

Design stable system prompts and you'll cut costs dramatically.

How to Save Even More

Batch API — OpenAI offers 50% off for async processing
Right-size your model — don't use GPT-5.4 for tasks GPT-4.1 nano handles
Monitor usage — use a token calculator to estimate before committing
Cache aggressively — same system prompt = cached pricing

Full Comparison

I track 33 models across 7 providers with daily updates. Check the full comparison:

🔗 Full pricing table
🧮 Token cost calculator

All data is free, no signup required. I update prices daily by checking each provider's official docs.

Built this because I was wasting time comparing pricing pages manually. Hope it helps someone else too.

Forem: Alex Mercer