<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Devon Torres</title>
    <description>The latest articles on Forem by Devon Torres (@devtorres_).</description>
    <link>https://forem.com/devtorres_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885324%2F48f8df6a-43fb-422b-b8ac-b1837de32b6d.png</url>
      <title>Forem: Devon Torres</title>
      <link>https://forem.com/devtorres_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/devtorres_"/>
    <language>en</language>
    <item>
      <title>Anthropic Just Changed Enterprise Pricing. Here's What It Actually Means.</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:28:18 +0000</pubDate>
      <link>https://forem.com/devtorres_/anthropic-just-changed-enterprise-pricing-heres-what-it-actually-means-4khf</link>
      <guid>https://forem.com/devtorres_/anthropic-just-changed-enterprise-pricing-heres-what-it-actually-means-4khf</guid>
      <description>&lt;p&gt;If you're on Anthropic's enterprise plan, you might have noticed the pricing page changed this week.&lt;/p&gt;

&lt;p&gt;The old deal was straightforward: $200 per user per month, bundled with a block of API tokens. Predictable. Easy to budget for.&lt;/p&gt;

&lt;p&gt;The new deal: $20 per seat plus full usage-based billing on every API call. A compliance expert quoted by The Register warned this "could potentially triple costs for some enterprise customers."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than you think
&lt;/h2&gt;

&lt;p&gt;The bundled model was great if your team's usage was consistent. You paid $200, you got your tokens, you planned around it. The usage-based model punishes two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unpredictable workloads.&lt;/strong&gt; If your team has spiky usage (big refactors, new feature sprints, batch processing), your monthly bill becomes a surprise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model selection laziness.&lt;/strong&gt; When tokens were bundled, nobody cared if a simple rename operation went through Opus. Now every call has a price tag and the expensive ones hurt.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What I'm doing about it
&lt;/h2&gt;

&lt;p&gt;Three things have actually moved the needle for me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track before you optimize.&lt;/strong&gt; You can't cut costs you can't see. I log every API call with the model, token count, and cost. Even a simple SQLite file with a rollup script gives you the visibility to know where money is going. Most teams discover that 60-70% of their calls are simple tasks hitting the most expensive model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Match the model to the task.&lt;/strong&gt; This is the big one. A docstring edit doesn't need Opus. A variable rename doesn't need Sonnet. The gap between Haiku's cost ($0.25/M input) and Opus ($15/M input) is 60x. If you can shift even 50% of your simple calls to a cheaper model, the math gets very different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set hard budget caps.&lt;/strong&gt; Anthropic's API console lets you set monthly spending limits. Use them. Better to hit a cap and adjust than to get a surprise bill at the end of the month.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This pricing shift isn't just Anthropic. OpenAI moved to usage-based pricing years ago. Google's Vertex AI is usage-based. The industry is converging on "pay for what you use" and that means cost management is now a core engineering skill, not an afterthought.&lt;/p&gt;

&lt;p&gt;The teams that figure out intelligent model selection early will have a meaningful cost advantage. The ones that keep sending everything through the most expensive model will feel it in their budget.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Opus 4.7 Tokenizer Ate Your Budget (30-Second Fix)</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:14:13 +0000</pubDate>
      <link>https://forem.com/devtorres_/the-opus-47-tokenizer-ate-your-budget-30-second-fix-1l13</link>
      <guid>https://forem.com/devtorres_/the-opus-47-tokenizer-ate-your-budget-30-second-fix-1l13</guid>
      <description>&lt;p&gt;Opus 4.7 uses a new tokenizer. Same code, same prompt, 25-35% more tokens. If your Claude bill jumped this week, that's why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: everything goes through Opus
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;

&lt;span class="c1"&gt;# After: match the model to the task
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pick_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docstring&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# 60x cheaper
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# only for complex stuff
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. About 60% of my API calls are simple tasks. Routing them to Haiku instead of Opus cuts my bill in half with zero quality difference on those tasks.&lt;/p&gt;

&lt;p&gt;The expensive model is for when you need it. Not for renaming a variable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>discuss</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The 80/20 Rule of AI Model Selection (Why You're Overpaying)</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Sat, 18 Apr 2026 17:07:45 +0000</pubDate>
      <link>https://forem.com/devtorres_/the-8020-rule-of-ai-model-selection-why-youre-overpaying-kn4</link>
      <guid>https://forem.com/devtorres_/the-8020-rule-of-ai-model-selection-why-youre-overpaying-kn4</guid>
      <description>&lt;p&gt;80% of your AI API calls probably don't need a frontier model.&lt;/p&gt;

&lt;p&gt;I spent a month tracking every API call in my workflow. The breakdown was almost comically predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;40% simple tasks&lt;/strong&gt;: formatting, imports, typo fixes, boilerplate generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;35% standard tasks&lt;/strong&gt;: refactoring, code review, test writing, documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20% complex tasks&lt;/strong&gt;: architecture decisions, multi-file debugging, system design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5% genuinely hard tasks&lt;/strong&gt;: novel algorithms, cross-system integration, performance optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only that bottom 25% actually benefits from Opus-class reasoning. The top 75% runs identically on cheaper models.&lt;/p&gt;

&lt;h2&gt;
  
  
  the cost math
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Cost per M tokens&lt;/th&gt;
&lt;th&gt;% of calls&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;$0.25 / $1.25&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;$3 / $15&lt;/td&gt;
&lt;td&gt;35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;$5 / $25&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard&lt;/td&gt;
&lt;td&gt;Opus (max effort)&lt;/td&gt;
&lt;td&gt;$5 / $25&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Weighted average cost: ~$2.60 / $12.25 per M tokens&lt;br&gt;
All-Opus cost: $5 / $25 per M tokens&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Savings: ~48% on input, ~51% on output.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And that's conservative. If you're aggressive about routing simple stuff to Haiku, savings can hit 60-70%.&lt;/p&gt;
&lt;h2&gt;
  
  
  how to actually do this
&lt;/h2&gt;

&lt;p&gt;Three approaches, from simplest to most automated:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Manual routing (free, 5 minutes)
&lt;/h3&gt;

&lt;p&gt;Just be intentional about which model you call. Before every API request, ask: does this need Opus?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pick_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;simple_keywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;import&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;typo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;simple_keywords&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# default to Sonnet, not Opus
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Keyword-based routing (free, 30 minutes)
&lt;/h3&gt;

&lt;p&gt;Build a classifier based on prompt characteristics. Long prompts with multiple files = Opus. Short prompts with clear intent = Haiku.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Automated routing (various tools)
&lt;/h3&gt;

&lt;p&gt;Several tools now handle this automatically. They classify prompt complexity and route to the cheapest model that can handle it. The category is called "LLM routing" or "model routing" if you want to explore options.&lt;/p&gt;

&lt;h2&gt;
  
  
  the uncomfortable truth
&lt;/h2&gt;

&lt;p&gt;Most developers default to the most expensive model because switching feels risky. But the risk is actually backwards: you're guaranteed to overpay with a single-model strategy. With routing, the worst case is Opus handles everything (same as before). The best case is 50-70% savings.&lt;/p&gt;

&lt;p&gt;The 80/20 rule applies perfectly here. 80% of your AI spend is going to the 20% of calls that don't need it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of my series on AI cost optimization. See also: &lt;a href="https://dev.to/devtorres_/how-i-cut-my-claude-api-bill-60-without-losing-quality-20ca"&gt;Cut Your Claude Bill 60%&lt;/a&gt;, &lt;a href="https://dev.to/devtorres_/opus-47-uses-35-more-tokens-than-46-heres-what-im-doing-about-it-2del"&gt;Opus 4.7 Token Analysis&lt;/a&gt;, &lt;a href="https://dev.to/devtorres_/claude-codes-hidden-setting-that-restores-pre-nerf-reasoning-quality-363o"&gt;Hidden Effort Level Setting&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Claude Code's Hidden Setting That Restores Pre-Nerf Reasoning Quality</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Sat, 18 Apr 2026 16:46:20 +0000</pubDate>
      <link>https://forem.com/devtorres_/claude-codes-hidden-setting-that-restores-pre-nerf-reasoning-quality-363o</link>
      <guid>https://forem.com/devtorres_/claude-codes-hidden-setting-that-restores-pre-nerf-reasoning-quality-363o</guid>
      <description>&lt;p&gt;Anthropic silently changed Claude Code's default effort level from &lt;code&gt;high&lt;/code&gt; to &lt;code&gt;medium&lt;/code&gt; back in March. Most users have no idea.&lt;/p&gt;

&lt;p&gt;One team measured a &lt;strong&gt;67% reasoning quality drop&lt;/strong&gt; across 6,852 sessions after the change. If you've noticed Claude Code making more mistakes lately, this might be why.&lt;/p&gt;

&lt;h2&gt;
  
  
  the fix
&lt;/h2&gt;

&lt;p&gt;Add this to your shell profile (&lt;code&gt;~/.zshrc&lt;/code&gt; or &lt;code&gt;~/.bashrc&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_EFFORT_LEVEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;high
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your terminal. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  what the effort levels actually do
&lt;/h2&gt;

&lt;p&gt;Claude Code has thinking budget tiers that control how much reasoning the model does before responding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;low&lt;/strong&gt;: minimal thinking, fastest responses, cheapest on token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;medium&lt;/strong&gt;: the current default (was &lt;code&gt;high&lt;/code&gt; before March)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;high&lt;/strong&gt;: what most power users expect, the pre-nerf default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max&lt;/strong&gt;: maximum reasoning depth, significantly more tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For daily coding work, &lt;code&gt;high&lt;/code&gt; is the sweet spot. It's what Anthropic originally shipped as the default before quietly dialing it back.&lt;/p&gt;

&lt;h2&gt;
  
  
  when to use max vs high
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;max&lt;/code&gt; sounds better but it causes problems on routine tasks. The model starts overthinking simple refactors and burns tokens on reasoning that doesn't improve the output.&lt;/p&gt;

&lt;p&gt;My rule of thumb:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;high&lt;/strong&gt; for everything by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max&lt;/strong&gt; only for genuinely hard problems (architecture decisions, complex debugging across multiple files)&lt;/li&gt;
&lt;li&gt;You can switch per-task with &lt;code&gt;/effort max&lt;/code&gt; in the CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  the thinking keywords
&lt;/h2&gt;

&lt;p&gt;You can also nudge thinking depth per-prompt with keywords:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"think" triggers ~4K thinking tokens&lt;/li&gt;
&lt;li&gt;"think hard" or "megathink" triggers ~10K&lt;/li&gt;
&lt;li&gt;"think harder" or "ultrathink" triggers ~32K&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These stack with the effort level setting.&lt;/p&gt;

&lt;h2&gt;
  
  
  if you're on opus 4.7
&lt;/h2&gt;

&lt;p&gt;4.7 always uses adaptive reasoning regardless of the &lt;code&gt;CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING&lt;/code&gt; env var (which only works on 4.6). For 4.7, the effort levels are your only control surface.&lt;/p&gt;

&lt;p&gt;The community consensus so far: &lt;code&gt;xhigh&lt;/code&gt; is the sweet spot for 4.7 coding work. &lt;code&gt;max&lt;/code&gt; works but leads to overthinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  bottom line
&lt;/h2&gt;

&lt;p&gt;If you haven't touched your effort level settings, you're running on a nerfed default. One line in your shell profile fixes it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Developer working on AI agent infrastructure. Previously: &lt;a href="https://dev.to/devtorres_/how-i-cut-my-claude-api-bill-60-without-losing-quality-20ca"&gt;How I Cut My Claude API Bill 60%&lt;/a&gt; and &lt;a href="https://dev.to/devtorres_/opus-47-uses-35-more-tokens-than-46-heres-what-im-doing-about-it-2del"&gt;Opus 4.7 Token Analysis&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Opus 4.7 Uses 35% More Tokens Than 4.6. Here's What I'm Doing About It.</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Sat, 18 Apr 2026 01:17:50 +0000</pubDate>
      <link>https://forem.com/devtorres_/opus-47-uses-35-more-tokens-than-46-heres-what-im-doing-about-it-2del</link>
      <guid>https://forem.com/devtorres_/opus-47-uses-35-more-tokens-than-46-heres-what-im-doing-about-it-2del</guid>
      <description>&lt;p&gt;The new Claude Opus 4.7 tokenizer is silently eating your budget.&lt;/p&gt;

&lt;p&gt;I ran the same prompts through both 4.6 and 4.7 last week. Identical code, identical context. 4.7 used 33-50% more tokens depending on the language mix. English text gets hit hardest — up to 47% inflation on prose-heavy prompts.&lt;/p&gt;

&lt;p&gt;This isn't a bug. It's the new tokenizer.&lt;/p&gt;

&lt;h2&gt;
  
  
  the math
&lt;/h2&gt;

&lt;p&gt;Same prompt, same output quality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Opus 4.6: 1,000 input tokens → $0.005&lt;/li&gt;
&lt;li&gt;Opus 4.7: 1,350 input tokens → $0.00675&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a 35% effective price increase with no announcement. The per-token price didn't change ($5/$25 per million). But the same work costs more tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  why this matters for daily users
&lt;/h2&gt;

&lt;p&gt;If you're on the Max plan ($200/mo), your usage quota burns 35% faster. Multiple Reddit threads confirm this — people hitting limits in 19 minutes instead of hours.&lt;/p&gt;

&lt;p&gt;If you're on API, your bill just went up 35% for the same work.&lt;/p&gt;

&lt;h2&gt;
  
  
  what I'm doing
&lt;/h2&gt;

&lt;p&gt;I'm not abandoning 4.7. The reasoning improvements are real on complex tasks. But I'm being selective:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tasks that stay on 4.6:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code refactoring (tokenizer doesn't matter, reasoning is the same)&lt;/li&gt;
&lt;li&gt;Simple completions and edits&lt;/li&gt;
&lt;li&gt;Any task where the prompt is mostly code (code tokenization barely changed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tasks that get 4.7:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step debugging that requires deep reasoning chains&lt;/li&gt;
&lt;li&gt;Architecture decisions where the reasoning quality improvement justifies the token premium&lt;/li&gt;
&lt;li&gt;Anything where 4.6 was already struggling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  the practical setup
&lt;/h2&gt;

&lt;p&gt;In Claude Code you can pin your model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_DEFAULT_OPUS_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-opus-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps 4.6 as default. When you need 4.7 for a specific task, use &lt;code&gt;/model opus&lt;/code&gt; to switch temporarily.&lt;/p&gt;

&lt;p&gt;On the API side, just specify the model ID explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# default
# switch to 4.7 only for complex tasks
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  results after one week
&lt;/h2&gt;

&lt;p&gt;My API bill dropped 28% compared to the first three days on 4.7 where I let everything default to the new model. Quality on complex tasks stayed the same because those still get 4.7.&lt;/p&gt;

&lt;p&gt;The takeaway: 4.7 is better at reasoning but worse at token efficiency. Use both strategically instead of defaulting to the newest model.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Developer working on AI infrastructure. Previously: &lt;a href="https://dev.to/devtorres_/how-i-cut-my-claude-api-bill-60-without-losing-quality-20ca"&gt;How I Cut My Claude API Bill 60%&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I Cut My Claude API Bill 60% Without Losing Quality</title>
      <dc:creator>Devon Torres</dc:creator>
      <pubDate>Sat, 18 Apr 2026 01:12:12 +0000</pubDate>
      <link>https://forem.com/devtorres_/how-i-cut-my-claude-api-bill-60-without-losing-quality-20ca</link>
      <guid>https://forem.com/devtorres_/how-i-cut-my-claude-api-bill-60-without-losing-quality-20ca</guid>
      <description>&lt;p&gt;I was spending $45/month on the Claude API. Not crazy money, but it bugged me because I knew most of my tokens were going to simple tasks that didn't need Opus-level reasoning.&lt;/p&gt;

&lt;p&gt;here's what worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  the problem
&lt;/h2&gt;

&lt;p&gt;I was calling &lt;code&gt;claude-opus-4-6&lt;/code&gt; for everything. Refactors, typo fixes, code reviews, architecture decisions — all Opus, all the time. At $5/$25 per million tokens (input/output), those quick "fix this import" prompts were costing the same per-token as "design me a distributed cache invalidation strategy."&lt;/p&gt;

&lt;p&gt;I looked at my usage and roughly 80% of my prompts were simple stuff. The kind of thing Haiku or Sonnet handles perfectly fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  what I tried (and what failed)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: pin everything to Sonnet.&lt;/strong&gt; Costs dropped immediately but quality tanked on the hard tasks. Multi-file refactors got confused. Architecture suggestions became generic. Sonnet is great but it's not Opus on the stuff that actually matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: manually switch models per task.&lt;/strong&gt; This worked in theory but in practice I'd forget to switch back to Opus when I needed it. Or I'd second-guess myself: "is this task complex enough for Opus?" Decision fatigue killed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3: route by task complexity.&lt;/strong&gt; This is the one that stuck.&lt;/p&gt;

&lt;h2&gt;
  
  
  the routing approach
&lt;/h2&gt;

&lt;p&gt;Simple rule: classify the task before sending it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quick edits, imports, typo fixes&lt;/strong&gt; → Haiku at $0.25/$1.25 per M. These are 10-20x cheaper than Opus and the output is identical for simple operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Standard refactors, code reviews, test writing&lt;/strong&gt; → Sonnet at $3/$15 per M. Handles 80% of real coding work at 40% the cost of Opus.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architecture decisions, complex debugging, multi-system design&lt;/strong&gt; → Opus at $5/$25 per M. Only when you genuinely need the reasoning depth.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  my results after 30 days
&lt;/h2&gt;

&lt;p&gt;Monthly spend went from $45 to $18. That's a 60% reduction. Quality on the hard tasks stayed the same because they still got Opus. I just stopped paying Opus prices for fixing semicolons.&lt;/p&gt;

&lt;h2&gt;
  
  
  the uncomfortable truth
&lt;/h2&gt;

&lt;p&gt;Most of us are overpaying for AI because switching costs feel higher than they are. "What if Sonnet misses something?" is the fear. But after a month of routing, I can say: Sonnet doesn't miss anything on standard coding tasks. Haiku doesn't miss anything on simple edits.&lt;/p&gt;

&lt;p&gt;The frontier tax is real. You're paying 10-20x more for capabilities you use 20% of the time.&lt;/p&gt;

&lt;h2&gt;
  
  
  what about opus 4.7?
&lt;/h2&gt;

&lt;p&gt;The new tokenizer makes this even more relevant. Same prompts use 33-50% more tokens on 4.7 due to the tokenizer change. If you were on the fence about routing before, the 4.7 token inflation should push you over.&lt;/p&gt;

&lt;p&gt;Route simple stuff to 4.6 (cheaper tokenizer), complex stuff to 4.7 (better reasoning). Best of both worlds.&lt;/p&gt;

&lt;h2&gt;
  
  
  try it
&lt;/h2&gt;

&lt;p&gt;Start with the manual approach. Track your API calls for a week. Count how many are genuinely complex vs routine. I bet you'll find the same 80/20 split I did.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm a developer working on AI agent infrastructure. This is what I learned from actually looking at my token usage instead of just complaining about the bill.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
