Forem: Devon Torres

Anthropic Just Changed Enterprise Pricing. Here's What It Actually Means.

Devon Torres — Mon, 20 Apr 2026 16:28:18 +0000

If you're on Anthropic's enterprise plan, you might have noticed the pricing page changed this week.

The old deal was straightforward: $200 per user per month, bundled with a block of API tokens. Predictable. Easy to budget for.

The new deal: $20 per seat plus full usage-based billing on every API call. A compliance expert quoted by The Register warned this "could potentially triple costs for some enterprise customers."

Why this matters more than you think

The bundled model was great if your team's usage was consistent. You paid $200, you got your tokens, you planned around it. The usage-based model punishes two things:

Unpredictable workloads. If your team has spiky usage (big refactors, new feature sprints, batch processing), your monthly bill becomes a surprise.
Model selection laziness. When tokens were bundled, nobody cared if a simple rename operation went through Opus. Now every call has a price tag and the expensive ones hurt.

What I'm doing about it

Three things have actually moved the needle for me:

Track before you optimize. You can't cut costs you can't see. I log every API call with the model, token count, and cost. Even a simple SQLite file with a rollup script gives you the visibility to know where money is going. Most teams discover that 60-70% of their calls are simple tasks hitting the most expensive model.

Match the model to the task. This is the big one. A docstring edit doesn't need Opus. A variable rename doesn't need Sonnet. The gap between Haiku's cost ($0.25/M input) and Opus ($15/M input) is 60x. If you can shift even 50% of your simple calls to a cheaper model, the math gets very different.

Set hard budget caps. Anthropic's API console lets you set monthly spending limits. Use them. Better to hit a cap and adjust than to get a surprise bill at the end of the month.

The bigger picture

This pricing shift isn't just Anthropic. OpenAI moved to usage-based pricing years ago. Google's Vertex AI is usage-based. The industry is converging on "pay for what you use" and that means cost management is now a core engineering skill, not an afterthought.

The teams that figure out intelligent model selection early will have a meaningful cost advantage. The ones that keep sending everything through the most expensive model will feel it in their budget.

The Opus 4.7 Tokenizer Ate Your Budget (30-Second Fix)

Devon Torres — Sun, 19 Apr 2026 15:14:13 +0000

Opus 4.7 uses a new tokenizer. Same code, same prompt, 25-35% more tokens. If your Claude bill jumped this week, that's why.

The Fix

# Before: everything goes through Opus
response = client.messages.create(model="claude-opus-4-6", ...)

# After: match the model to the task
def pick_model(task_type):
    if task_type in ["rename", "format", "docstring"]:
        return "claude-haiku-4-5-20251001"  # 60x cheaper
    elif task_type in ["implement", "fix_bug"]:
        return "claude-sonnet-4-6"
    else:
        return "claude-opus-4-6"  # only for complex stuff

That's it. About 60% of my API calls are simple tasks. Routing them to Haiku instead of Opus cuts my bill in half with zero quality difference on those tasks.

The expensive model is for when you need it. Not for renaming a variable.

The 80/20 Rule of AI Model Selection (Why You're Overpaying)

Devon Torres — Sat, 18 Apr 2026 17:07:45 +0000

80% of your AI API calls probably don't need a frontier model.

I spent a month tracking every API call in my workflow. The breakdown was almost comically predictable:

40% simple tasks: formatting, imports, typo fixes, boilerplate generation
35% standard tasks: refactoring, code review, test writing, documentation
20% complex tasks: architecture decisions, multi-file debugging, system design
5% genuinely hard tasks: novel algorithms, cross-system integration, performance optimization

Only that bottom 25% actually benefits from Opus-class reasoning. The top 75% runs identically on cheaper models.

the cost math

Task Type	Best Model	Cost per M tokens	% of calls
Simple	Haiku	$0.25 / $1.25	40%
Standard	Sonnet	$3 / $15	35%
Complex	Opus	$5 / $25	20%
Hard	Opus (max effort)	$5 / $25	5%

Weighted average cost: ~$2.60 / $12.25 per M tokens
All-Opus cost: $5 / $25 per M tokens

Savings: ~48% on input, ~51% on output.

And that's conservative. If you're aggressive about routing simple stuff to Haiku, savings can hit 60-70%.

how to actually do this

Three approaches, from simplest to most automated:

1. Manual routing (free, 5 minutes)

Just be intentional about which model you call. Before every API request, ask: does this need Opus?

def pick_model(task_description):
    simple_keywords = ["fix", "format", "import", "typo", "rename"]
    if any(k in task_description.lower() for k in simple_keywords):
        return "claude-haiku-4-5"
    return "claude-sonnet-4-6"  # default to Sonnet, not Opus

2. Keyword-based routing (free, 30 minutes)

Build a classifier based on prompt characteristics. Long prompts with multiple files = Opus. Short prompts with clear intent = Haiku.

3. Automated routing (various tools)

Several tools now handle this automatically. They classify prompt complexity and route to the cheapest model that can handle it. The category is called "LLM routing" or "model routing" if you want to explore options.

the uncomfortable truth

Most developers default to the most expensive model because switching feels risky. But the risk is actually backwards: you're guaranteed to overpay with a single-model strategy. With routing, the worst case is Opus handles everything (same as before). The best case is 50-70% savings.

The 80/20 rule applies perfectly here. 80% of your AI spend is going to the 20% of calls that don't need it.

Part of my series on AI cost optimization. See also: Cut Your Claude Bill 60%, Opus 4.7 Token Analysis, Hidden Effort Level Setting

Claude Code's Hidden Setting That Restores Pre-Nerf Reasoning Quality

Devon Torres — Sat, 18 Apr 2026 16:46:20 +0000

Anthropic silently changed Claude Code's default effort level from high to medium back in March. Most users have no idea.

One team measured a 67% reasoning quality drop across 6,852 sessions after the change. If you've noticed Claude Code making more mistakes lately, this might be why.

the fix

Add this to your shell profile (~/.zshrc or ~/.bashrc):

export CLAUDE_CODE_EFFORT_LEVEL=high

Restart your terminal. That's it.

what the effort levels actually do

Claude Code has thinking budget tiers that control how much reasoning the model does before responding:

low: minimal thinking, fastest responses, cheapest on token usage
medium: the current default (was high before March)
high: what most power users expect, the pre-nerf default
max: maximum reasoning depth, significantly more tokens

For daily coding work, high is the sweet spot. It's what Anthropic originally shipped as the default before quietly dialing it back.

when to use max vs high

max sounds better but it causes problems on routine tasks. The model starts overthinking simple refactors and burns tokens on reasoning that doesn't improve the output.

My rule of thumb:

high for everything by default
max only for genuinely hard problems (architecture decisions, complex debugging across multiple files)
You can switch per-task with /effort max in the CLI

the thinking keywords

You can also nudge thinking depth per-prompt with keywords:

"think" triggers ~4K thinking tokens
"think hard" or "megathink" triggers ~10K
"think harder" or "ultrathink" triggers ~32K

These stack with the effort level setting.

if you're on opus 4.7

4.7 always uses adaptive reasoning regardless of the CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING env var (which only works on 4.6). For 4.7, the effort levels are your only control surface.

The community consensus so far: xhigh is the sweet spot for 4.7 coding work. max works but leads to overthinking.

bottom line

If you haven't touched your effort level settings, you're running on a nerfed default. One line in your shell profile fixes it.

Developer working on AI agent infrastructure. Previously: How I Cut My Claude API Bill 60% and Opus 4.7 Token Analysis

Opus 4.7 Uses 35% More Tokens Than 4.6. Here's What I'm Doing About It.

Devon Torres — Sat, 18 Apr 2026 01:17:50 +0000

The new Claude Opus 4.7 tokenizer is silently eating your budget.

I ran the same prompts through both 4.6 and 4.7 last week. Identical code, identical context. 4.7 used 33-50% more tokens depending on the language mix. English text gets hit hardest — up to 47% inflation on prose-heavy prompts.

This isn't a bug. It's the new tokenizer.

the math

Same prompt, same output quality:

Opus 4.6: 1,000 input tokens → $0.005
Opus 4.7: 1,350 input tokens → $0.00675

That's a 35% effective price increase with no announcement. The per-token price didn't change ($5/$25 per million). But the same work costs more tokens.

why this matters for daily users

If you're on the Max plan ($200/mo), your usage quota burns 35% faster. Multiple Reddit threads confirm this — people hitting limits in 19 minutes instead of hours.

If you're on API, your bill just went up 35% for the same work.

what I'm doing

I'm not abandoning 4.7. The reasoning improvements are real on complex tasks. But I'm being selective:

Tasks that stay on 4.6:

Code refactoring (tokenizer doesn't matter, reasoning is the same)
Simple completions and edits
Any task where the prompt is mostly code (code tokenization barely changed)

Tasks that get 4.7:

Multi-step debugging that requires deep reasoning chains
Architecture decisions where the reasoning quality improvement justifies the token premium
Anything where 4.6 was already struggling

the practical setup

In Claude Code you can pin your model:

export ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6

This keeps 4.6 as default. When you need 4.7 for a specific task, use /model opus to switch temporarily.

On the API side, just specify the model ID explicitly:

model = "claude-opus-4-6"  # default
# switch to 4.7 only for complex tasks
model = "claude-opus-4-7"

results after one week

My API bill dropped 28% compared to the first three days on 4.7 where I let everything default to the new model. Quality on complex tasks stayed the same because those still get 4.7.

The takeaway: 4.7 is better at reasoning but worse at token efficiency. Use both strategically instead of defaulting to the newest model.

Developer working on AI infrastructure. Previously: How I Cut My Claude API Bill 60%

How I Cut My Claude API Bill 60% Without Losing Quality

Devon Torres — Sat, 18 Apr 2026 01:12:12 +0000

I was spending $45/month on the Claude API. Not crazy money, but it bugged me because I knew most of my tokens were going to simple tasks that didn't need Opus-level reasoning.

here's what worked.

the problem

I was calling claude-opus-4-6 for everything. Refactors, typo fixes, code reviews, architecture decisions — all Opus, all the time. At $5/$25 per million tokens (input/output), those quick "fix this import" prompts were costing the same per-token as "design me a distributed cache invalidation strategy."

I looked at my usage and roughly 80% of my prompts were simple stuff. The kind of thing Haiku or Sonnet handles perfectly fine.

what I tried (and what failed)

Attempt 1: pin everything to Sonnet. Costs dropped immediately but quality tanked on the hard tasks. Multi-file refactors got confused. Architecture suggestions became generic. Sonnet is great but it's not Opus on the stuff that actually matters.

Attempt 2: manually switch models per task. This worked in theory but in practice I'd forget to switch back to Opus when I needed it. Or I'd second-guess myself: "is this task complex enough for Opus?" Decision fatigue killed it.

Attempt 3: route by task complexity. This is the one that stuck.

the routing approach

Simple rule: classify the task before sending it.

Quick edits, imports, typo fixes → Haiku at $0.25/$1.25 per M. These are 10-20x cheaper than Opus and the output is identical for simple operations.
Standard refactors, code reviews, test writing → Sonnet at $3/$15 per M. Handles 80% of real coding work at 40% the cost of Opus.
Architecture decisions, complex debugging, multi-system design → Opus at $5/$25 per M. Only when you genuinely need the reasoning depth.

my results after 30 days

Monthly spend went from $45 to $18. That's a 60% reduction. Quality on the hard tasks stayed the same because they still got Opus. I just stopped paying Opus prices for fixing semicolons.

the uncomfortable truth

Most of us are overpaying for AI because switching costs feel higher than they are. "What if Sonnet misses something?" is the fear. But after a month of routing, I can say: Sonnet doesn't miss anything on standard coding tasks. Haiku doesn't miss anything on simple edits.

The frontier tax is real. You're paying 10-20x more for capabilities you use 20% of the time.

what about opus 4.7?

The new tokenizer makes this even more relevant. Same prompts use 33-50% more tokens on 4.7 due to the tokenizer change. If you were on the fence about routing before, the 4.7 token inflation should push you over.

Route simple stuff to 4.6 (cheaper tokenizer), complex stuff to 4.7 (better reasoning). Best of both worlds.

try it

Start with the manual approach. Track your API calls for a week. Count how many are genuinely complex vs routine. I bet you'll find the same 80/20 split I did.

I'm a developer working on AI agent infrastructure. This is what I learned from actually looking at my token usage instead of just complaining about the bill.