Forem: 4663437Mehdi

Token Ledger Digest – 2026-05-20

4663437Mehdi — Wed, 20 May 2026 10:30:01 +0000

Token Ledger Digest – 2026-05-20

Lead change – biggest cost impact

Google Gemini Flash Latest (~google/gemini-flash-latest)
- Prompt price rose from $0.50/1M to $1.50/1M (+$1.00/1M).
- Completion price rose from $3.00/1M to $9.00/1M (+$6.00/1M).
- Who should care: Teams running high‑volume inference on this model will see per‑million‑token costs jump by $7.00; consider alternatives or prompt‑completion optimization.

Other price changes

Z.ai GLM 5.1 (z-ai/glm-5.1)
- Prompt price dropped from $0.98/1M to $0.00/1M.
- Completion price dropped from $3.08/1M to $0.00/1M.
- Who should care: Users can now run this model at zero token cost; ideal for cost‑sensitive prototypes or batch workloads.
Qwen: Qwen3.6 35B A3B (qwen/qwen3.6-35b-a3b)
- Prompt price fell slightly from $0.15/1M to $0.149/1M (‑$0.001/1M).
- Completion price unchanged at $1.00/1M.
- Who should care: Negligible impact; monitor for further drift.
Qwen: Qwen3.5‑35B‑A3B (qwen/qwen3.5-35b-a3b)
- Prompt price fell from $0.14/1M to $0.139/1M (‑$0.001/1M).
- Completion price unchanged at $1.00/1M.
- Who should care: Minimal effect; no action needed.

New model added

Google Gemini 3.5 Flash (google/gemini-3.5-flash)
- Prompt price: $1.50/1M.
- Completion price: $9.00/1M.
- Context window: 1,048,576 tokens.
- Who should care: Developers needing very long contexts; compare pricing against other long‑context options.

Summary

Total models tracked: 357. No other meaningful changes today.

Originally published at The Token Ledger. Subscribe for the daily digest.

The Token Ledger – 2026-05-19

4663437Mehdi — Tue, 19 May 2026 10:44:31 +0000

The Token Ledger – 2026-05-19

Most cost‑impacting change: NVIDIA’s Nemotron 3 Super completion price fell from $0.50 to $0.45 per 1M tokens (‑$0.05/1M), a 10% reduction.

Price changes

Model	What changed	Old price ($/1M)	New price ($/1M)	Who should care
NVIDIA: Nemotron 3 Super	Prompt ↓	0.10 → 0.09	Completion ↓	0.50 → 0.45
Google: Gemma 4 26B A4B	Prompt ↓	0.07 → 0.06	Completion ↓	0.34 → 0.33
OpenAI: gpt-oss-120b	Completion ↓	0.19 → 0.18	(Prompt unchanged)	0.039 → 0.039
Mistral: Mistral Nemo	Completion ↓	0.04 → 0.03	(Prompt unchanged)	0.02 → 0.02

No models were added or removed today. The cheapest models remain inclusionAI: Ling‑2.6‑flash, IBM: Granite 4.0 Micro, and Meta: Llama 3.1 8B Instruct (see source data for full list).

Originally published at The Token Ledger. Subscribe for the daily digest.

The Token Ledger — May 17, 2026

4663437Mehdi — Sun, 17 May 2026 09:16:05 +0000

The Token Ledger — May 17, 2026

Three providers raised completion prices today; NVIDIA’s Nemotron 3 Super saw the largest absolute increase. No new models were added or removed.

NVIDIA: Nemotron 3 Super (120B A12B)

Prompt: $0.09/1M → $0.10/1M (+11.1%)

Completion: $0.45/1M → $0.50/1M (+$0.05, +11.1%)

Impact: Heaviest token-cost increase today. Relevant for agents and reasoning workflows.

Mistral: Mistral Nemo

Prompt: unchanged at $0.02/1M

Completion: $0.03/1M → $0.04/1M (+33.3%)

Relative jump is steep, but absolute cost remains low. Relevant for lightweight local-style deployments.

Google: Gemma 4 26B A4B

Prompt: $0.06/1M → $0.07/1M (+16.7%)

Completion: $0.33/1M → $0.34/1M (+3%)

Smaller absolute impact vs. Nemotron; still a 17% prompt hike.

OpenAI: gpt-oss-120b

Prompt: unchanged at $0.039/1M

Completion: $0.18/1M → $0.19/1M (+5.6%)

Marginal; likely overlooked in volume.

Cheapest models today (by prompt price):

inclusionAI: Ling-2.6-flash — $0.01/1M prompt, $0.03/1M completion
IBM: Granite 4.0 Micro — $0.017/1M prompt, $0.112/1M completion
Meta: Llama 3.1 8B Instruct — $0.02/1M prompt, $0.05/1M completion

Total tracked models: 356.

Originally published at The Token Ledger. Subscribe for the daily digest.

The Token Ledger Digest – 2026-05-16

4663437Mehdi — Sat, 16 May 2026 09:01:03 +0000

The Token Ledger Digest – 2026-05-16

No meaningful changes today.

Cheapest models (per 1M tokens)

inclusionAI: Ling-2.6-flash
- What changed: —
- Prompt: $0.01 / 1M Completion: $0.03 / 1M
- Who should care: Developers seeking the lowest‑cost inference for short‑to‑medium prompts.
Mistral: Mistral Nemo
- What changed: —
- Prompt: $0.02 / 1M Completion: $0.03 / 1M
- Who should care: Teams needing a balanced low‑cost model with strong multilingual ability.
Meta: Llama 3.1 8B Instruct
- What changed: —
- Prompt: $0.02 / 1M Completion: $0.05 / 1M
- Who should care: Users who want a widely‑available 8B model at minimal expense.

Originally published at The Token Ledger. Subscribe for the daily digest.

Token Ledger – 2026-05-15

4663437Mehdi — Fri, 15 May 2026 23:37:01 +0000

Token Ledger – 2026-05-15

356 models added, 0 removed, 0 price changes. The largest influx on record reframes the cost landscape. Leading the batch is a 1-trillion-parameter model at sub-dollar rates.

Most cost-impacting addition

inclusionAI: Ring-2.6-1T – $0.075 / 1M input, $0.625 / 1M output, 262k context.

A 1T-parameter dense Mixture-of-Experts model at this price point is unprecedented. For reference, comparable-scale models typically run 5-10× higher. Developers processing high-volume reasoning tasks should test immediately.

Other notable low-cost entries

IBM: Granite 4.1 8B – $0.05 / 1M input, $0.10 / 1M output, 131k context. Cheapest 8B in the fleet.
Google: Gemini 3.1 Flash Lite – $0.25 / 1M input, $1.50 / 1M output, 1M context. Largest context-to-cost ratio on a production model.
Perceptron: Perceptron Mk1 – $0.15 / 1M input, $1.50 / 1M output, 32k context. New entrant at the ultra-budget tier.
xAI: Grok 4.3 – $1.25 / 1M input, $2.50 / 1M output, 1M context. Lower than Grok 4.2 pricing.

Premium tier

Anthropic: Claude Opus 4.7 (Fast) – $30 / 1M input, $150 / 1M output, 1M context. Fast variant of Opus.
OpenAI: GPT Chat Latest – $5 / 1M input, $30 / 1M output, 400k context. New default chat model.

Free models added

Baidu Qianfan CoBuddy, NVIDIA Nemotron 3 Nano Omni, Poolside Laguna XS.2 & M.1, and OpenRouter Owl Alpha are available at zero cost.

All additions bring the platform to 356 total models. No existing model prices changed.

Originally published at The Token Ledger. Subscribe for the daily digest.