<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Abid Ali</title>
    <description>The latest articles on Forem by Abid Ali (@buildwithabid).</description>
    <link>https://forem.com/buildwithabid</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3862016%2F0f5c37f0-d375-4e95-8528-1762f0dba065.jpeg</url>
      <title>Forem: Abid Ali</title>
      <link>https://forem.com/buildwithabid</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/buildwithabid"/>
    <language>en</language>
    <item>
      <title>How I Found $1,240/Month in Wasted LLM API Costs (And Built a Tool to Find Yours)</title>
      <dc:creator>Abid Ali</dc:creator>
      <pubDate>Sun, 05 Apr 2026 08:48:00 +0000</pubDate>
      <link>https://forem.com/buildwithabid/how-i-found-1240month-in-wasted-llm-api-costs-and-built-a-tool-to-find-yours-3041</link>
      <guid>https://forem.com/buildwithabid/how-i-found-1240month-in-wasted-llm-api-costs-and-built-a-tool-to-find-yours-3041</guid>
      <description>&lt;p&gt;I was spending about $2,000/month on OpenAI and Anthropic APIs across a few projects.&lt;/p&gt;

&lt;p&gt;I knew some of it was wasteful. I just couldn't prove it. The provider dashboards show you one number — your total bill. That's like getting an electricity bill with no breakdown. Is it the AC? The lights? The server room? No idea.&lt;/p&gt;

&lt;p&gt;So I built a tool to find out. What it discovered was honestly embarrassing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I found
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;34% of my summarizer calls were retries.&lt;/strong&gt; The prompt asked for JSON, but the model kept wrapping it in markdown code blocks. My parser rejected it. The retry loop ran the same call again. And again. Each retry cost money. Total waste: about $140/month — from a six-word fix I could have made months ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;85% of my classifier calls were duplicates.&lt;/strong&gt; Same input, same output, full price every time. No caching. 723 of 847 weekly calls were completely redundant. A simple cache would have saved $310/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My classifier was using GPT-4o for a yes/no task.&lt;/strong&gt; The output was always under 10 tokens — one of five fixed labels. GPT-4o-mini produces identical results at a fraction of the cost. Savings: $71/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My chatbot was stuffing the entire conversation history into every call.&lt;/strong&gt; By message 20, the input was 3,200 tokens and growing. Only the last few messages mattered. Truncating to the last 5 saves $155/month.&lt;/p&gt;

&lt;p&gt;Total: &lt;strong&gt;$1,240/month in waste&lt;/strong&gt; out of a $2,847 monthly spend. That's 43%.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool: LLM Cost Profiler
&lt;/h2&gt;

&lt;p&gt;I packaged all of this into an open-source Python CLI. Here's how it works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llm-spend-profiler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Wrap your client (2 lines of code)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llm_cost_profiler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wrap&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your code works exactly as before. Every API call is now silently logged to a local SQLite database. If logging fails for any reason, it fails silently — your app is never affected.&lt;/p&gt;

&lt;p&gt;Works with Anthropic too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: See where your money goes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;llmcost report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls

By Feature:
  summarizer         $412.80  (48.7%)  ████████████████████
  chatbot            $203.11  (24.0%)  ████████████
  classifier          $89.40  (10.5%)  █████
  content_gen         $78.22   (9.2%)  ████
  extraction          $41.50   (4.9%)  ██
  untagged            $22.29   (2.6%)  █

Warnings:
  ⚠ summarizer: 34% of calls are retries ($140.15 wasted)
  ⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
  ⚠ classifier: using gpt-4o but output is always &amp;lt;10 tokens (cheaper model works)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Find the waste
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;llmcost optimize
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)

  #1 CACHE — classifier.py:34                        [SAVE $310/mo]
     85% of calls are exact duplicates (723 of 847/week)
     → Add @cache decorator
     Confidence: HIGH

  #2 RETRY FIX — content_gen.py:112                   [SAVE $180/mo]
     28% retry rate from JSON parse errors
     → Fix prompt to return raw JSON
     Confidence: HIGH

  #3 MODEL DOWNGRADE — classifier.py:34               [SAVE $71/mo]
     Output is always &amp;lt;10 tokens, one of 5 fixed labels
     → Switch gpt-4o to gpt-4o-mini
     Confidence: MEDIUM

  #4 CONTEXT BLOAT — chatbot.py:123                   [SAVE $155/mo]
     Avg 3,200 input tokens, growing over conversation
     → Truncate history to last 5 messages
     Confidence: MEDIUM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each recommendation includes the exact file and line number, estimated monthly savings, and a confidence level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other features worth knowing about
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;llmcost hotspots&lt;/code&gt;&lt;/strong&gt; — ranks your code locations by cost. Auto-detected from the Python call stack, no manual annotation needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Top Cost Hotspots:
  1. features/summarizer.py:47   summarize_doc()    $412.80/week   4,201 calls
  2. api/chat.py:123             handle_message()   $203.11/week   3,892 calls
  3. pipeline/classify.py:34     classify_text()     $89.40/week   2,847 calls
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;llmcost compare&lt;/code&gt;&lt;/strong&gt; — week-over-week comparison to catch sudden spikes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;llmcost dashboard&lt;/code&gt;&lt;/strong&gt; — opens a local web dashboard at localhost:8177 with treemap charts, cost timelines, and an optimization waterfall. Single HTML file, no npm, no build step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tagging&lt;/strong&gt; — group costs by feature, customer, or environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llm_cost_profiler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme_corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Caching decorator&lt;/strong&gt; — stop paying for duplicate calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llm_cost_profiler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;

&lt;span class="nd"&gt;@cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Wrapper&lt;/strong&gt;: Transparent proxy pattern — intercepts method calls without monkey-patching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: SQLite with WAL mode at &lt;code&gt;~/.llmcost/data.db&lt;/code&gt;. Thread-safe.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Built-in lookup table for OpenAI and Anthropic models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call site detection&lt;/strong&gt;: Walks the Python call stack to auto-detect which function triggered each call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero dependencies&lt;/strong&gt;: Only uses the Python standard library.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt;: Everything stays local. Nothing is sent anywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it on your codebase
&lt;/h2&gt;

&lt;p&gt;If you're making LLM API calls in any project, I'm genuinely curious what it finds. In my experience, every codebase has at least one surprise — usually duplicate calls that nobody knew about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/BuildWithAbid/llm-cost-profiler" rel="noopener noreferrer"&gt;https://github.com/BuildWithAbid/llm-cost-profiler&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install llm-spend-profiler&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/p&gt;

&lt;p&gt;If you find issues or have ideas for what else it should detect, open an issue or drop a comment here. This is my first open-source project and I'd love feedback.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
