Forem: Abid Ali

I Got Tired of Writing Documentation. So I Built a Tool to Do It For Me.

Abid Ali — Thu, 09 Apr 2026 05:41:58 +0000

Every project I ship has the same problem at the end.

The code works. The tests pass. And then I have to write the README.

Not a bad README — a real one. Architecture decisions, API endpoints, setup instructions, module breakdown. The kind of documentation that makes someone else's first hour with your codebase not a nightmare.

I kept putting it off. Then I'd come back to my own projects two weeks later and spend 20 minutes remembering how my own code worked.

So I built a tool to generate it automatically.

What it does

repo2docs points at a GitHub repo or a local directory and generates three documents:

README.md — setup, usage, what the project does
ARCHITECTURE.md — how the codebase is structured, entry points, module breakdown
API.md — HTTP endpoints, routes, request/response shapes

One command. Three documents. Done.

# Point at a GitHub repo
repo2docs https://github.com/owner/repository

# Or a local directory
repo2docs .

# Custom output folder
repo2docs ../my-service --output ./docs-output/my-service

Output goes to repo2docs-output/<repo-name>/ by default. No flags required, no config files, no setup beyond install.

What it actually detects

The part I'm most proud of is that it doesn't generate generic documentation. It reads the actual codebase.

It picks up:

Entry points and important modules
Package manager, build tools, test setup, linting, CI signals
Framework detection — Express routes, mounted router prefixes composed into full paths like /api/users not just raw router-local paths
Environment files and notable repository patterns
Language distribution across the project

That last one matters. A generic documentation generator will tell you "this project uses JavaScript." This one tells you which files are entry points, which modules are doing the heavy work, and how the HTTP layer is structured.

The problem it solves

Documentation debt is one of those things that compounds silently. You skip the README on Monday because you're shipping. You skip the architecture doc on Tuesday because the code is obvious. By Friday you have a codebase that works perfectly and is completely opaque to anyone who didn't write it — including you in three weeks.

The real cost isn't the time it takes to write docs. It's the time every future reader spends reconstructing understanding that already existed in your head when you wrote the code.

repo2docs captures that understanding at the moment it's cheapest — right after you've shipped — and turns it into documents that stay with the codebase.

How to try it

npm install
npm run build
repo2docs https://github.com/your-repo

It runs against any public GitHub repo, so you can try it on something you already know well and see how accurately it captures the architecture.

Repo: github.com/BuildWithAbid/repo2docs

Curious what the output looks like on your codebase — drop a comment if you try it.

Built with Claude Code. Part of a suite of open-source developer tools at github.com/BuildWithAbid

How I Found $1,240/Month in Wasted LLM API Costs (And Built a Tool to Find Yours)

Abid Ali — Sun, 05 Apr 2026 08:48:00 +0000

I was spending about $2,000/month on OpenAI and Anthropic APIs across a few projects.

I knew some of it was wasteful. I just couldn't prove it. The provider dashboards show you one number — your total bill. That's like getting an electricity bill with no breakdown. Is it the AC? The lights? The server room? No idea.

So I built a tool to find out. What it discovered was honestly embarrassing.

What I found

34% of my summarizer calls were retries. The prompt asked for JSON, but the model kept wrapping it in markdown code blocks. My parser rejected it. The retry loop ran the same call again. And again. Each retry cost money. Total waste: about $140/month — from a six-word fix I could have made months ago.

85% of my classifier calls were duplicates. Same input, same output, full price every time. No caching. 723 of 847 weekly calls were completely redundant. A simple cache would have saved $310/month.

My classifier was using GPT-4o for a yes/no task. The output was always under 10 tokens — one of five fixed labels. GPT-4o-mini produces identical results at a fraction of the cost. Savings: $71/month.

My chatbot was stuffing the entire conversation history into every call. By message 20, the input was 3,200 tokens and growing. Only the last few messages mattered. Truncating to the last 5 saves $155/month.

Total: $1,240/month in waste out of a $2,847 monthly spend. That's 43%.

The tool: LLM Cost Profiler

I packaged all of this into an open-source Python CLI. Here's how it works.

Step 1: Install

pip install llm-spend-profiler

Step 2: Wrap your client (2 lines of code)

from llm_cost_profiler import wrap
from openai import OpenAI

client = wrap(OpenAI())

That's it. Your code works exactly as before. Every API call is now silently logged to a local SQLite database. If logging fails for any reason, it fails silently — your app is never affected.

Works with Anthropic too:

from anthropic import Anthropic
client = wrap(Anthropic())

Step 3: See where your money goes

$ llmcost report

LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls

By Feature:
  summarizer         $412.80  (48.7%)  ████████████████████
  chatbot            $203.11  (24.0%)  ████████████
  classifier          $89.40  (10.5%)  █████
  content_gen         $78.22   (9.2%)  ████
  extraction          $41.50   (4.9%)  ██
  untagged            $22.29   (2.6%)  █

Warnings:
  ⚠ summarizer: 34% of calls are retries ($140.15 wasted)
  ⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
  ⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)

Step 4: Find the waste

$ llmcost optimize

LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)

  #1 CACHE — classifier.py:34                        [SAVE $310/mo]
     85% of calls are exact duplicates (723 of 847/week)
     → Add @cache decorator
     Confidence: HIGH

  #2 RETRY FIX — content_gen.py:112                   [SAVE $180/mo]
     28% retry rate from JSON parse errors
     → Fix prompt to return raw JSON
     Confidence: HIGH

  #3 MODEL DOWNGRADE — classifier.py:34               [SAVE $71/mo]
     Output is always <10 tokens, one of 5 fixed labels
     → Switch gpt-4o to gpt-4o-mini
     Confidence: MEDIUM

  #4 CONTEXT BLOAT — chatbot.py:123                   [SAVE $155/mo]
     Avg 3,200 input tokens, growing over conversation
     → Truncate history to last 5 messages
     Confidence: MEDIUM

Each recommendation includes the exact file and line number, estimated monthly savings, and a confidence level.

Other features worth knowing about

llmcost hotspots — ranks your code locations by cost. Auto-detected from the Python call stack, no manual annotation needed:

Top Cost Hotspots:
  1. features/summarizer.py:47   summarize_doc()    $412.80/week   4,201 calls
  2. api/chat.py:123             handle_message()   $203.11/week   3,892 calls
  3. pipeline/classify.py:34     classify_text()     $89.40/week   2,847 calls

llmcost compare — week-over-week comparison to catch sudden spikes.

llmcost dashboard — opens a local web dashboard at localhost:8177 with treemap charts, cost timelines, and an optimization waterfall. Single HTML file, no npm, no build step.

Tagging — group costs by feature, customer, or environment:

from llm_cost_profiler import tag

with tag(feature="summarizer", customer="acme_corp"):
    response = client.chat.completions.create(...)

Caching decorator — stop paying for duplicate calls:

from llm_cost_profiler import cache

@cache(ttl=3600)
def classify_text(text):
    return client.chat.completions.create(...)

How it works under the hood

Wrapper: Transparent proxy pattern — intercepts method calls without monkey-patching.
Storage: SQLite with WAL mode at ~/.llmcost/data.db. Thread-safe.
Pricing: Built-in lookup table for OpenAI and Anthropic models.
Call site detection: Walks the Python call stack to auto-detect which function triggered each call.
Zero dependencies: Only uses the Python standard library.
Privacy: Everything stays local. Nothing is sent anywhere.

Try it on your codebase

If you're making LLM API calls in any project, I'm genuinely curious what it finds. In my experience, every codebase has at least one surprise — usually duplicate calls that nobody knew about.

GitHub: https://github.com/BuildWithAbid/llm-cost-profiler
Install: pip install llm-spend-profiler
License: MIT

If you find issues or have ideas for what else it should detect, open an issue or drop a comment here. This is my first open-source project and I'd love feedback.