Forem: Genie

ctxwatch — I Built the Missing Context-Saturation Daemon for Claude Code in 4 Hours

Genie — Thu, 16 Apr 2026 16:44:13 +0000

Six tools measure whether your Claude Code wallet is empty. Zero measure whether your brain is full. This is the story of the 600-line Python daemon that fixes that gap, built in one afternoon from a GitHub issue that had been open for exactly 24 hours.

The Alarm vs. The Smoke Detector

Yesterday a user named rmcoppersmith opened anthropics/claude-code#49226. The framing is so sharp it's almost marketing copy:

PreCompact hook fires when compaction happens, but it's too late for thoughtful memory writing (the alarm, not the smoke detector). Tool call counting is a crude proxy that doesn't account for varying response sizes. Manual /context command is not machine-parseable.

Three complaints, one thesis: we need a continuous signal, not a terminal alarm.

If you use Claude Code for hours at a time, you know this pain. You think everything is fine. You lose yourself in the flow. Then compaction fires — and suddenly the model has forgotten half the session. The hook you wrote to save important context? It ran, yes. But it ran after the fire, not when the smoke started.

Every existing monitor on the Claude Code side measures cost or quota. ccusage, claude-monitor, the six-or-so others I found while scanning this morning — they all answer the question "is my wallet empty?" None answer "is my brain full?"

Those are completely different axes.

What's Actually in the Transcript

Before building, I wanted to know what signal was already sitting on disk. If you peek at ~/.claude/projects/<project>/<session>.jsonl, each assistant turn has a usage block like this:

{
  "type": "assistant",
  "timestamp": "2026-04-17T00:00:00Z",
  "message": {
    "model": "claude-opus-4-7",
    "usage": {
      "input_tokens": 10,
      "cache_read_input_tokens": 5000,
      "cache_creation_input_tokens": 50000,
      "output_tokens": 500
    }
  }
}

The sum of those four fields is approximately the number of tokens that were visible to the model on that turn. That's your current saturation. The window size comes from the model name — 200K default, 1M for 1M-context subscribers.

There's a subtle trap here: Claude Code transcripts don't record the [1m] suffix even for 1M users. The model field just says claude-opus-4-7. So if you naively treat that as 200K, a 1M subscriber with 200K used shows as 100% — a nonsense reading. I'll come back to how I handled this.

The Tool

I called it ctxwatch. One Python file, stdlib only, six subcommands.

$ ctxwatch once
transcript: c6af100b-a2e7-4f2f-ba1a-cd9b0503c71d.jsonl
[████░░░░░░░░░░░░░░░░]  20.4%   204,214 / 1,000,000  —  turns=69  OK

The daemon mode (ctxwatch watch) tails the most recent transcript and prints a new bar each time the assistant responds. If you prefer JSON for statuslines:

$ ctxwatch json
{"ts":"2026-04-17T00:41:05Z","tokens":204214,"window":1000000,"pct":0.2042,"turns":69,"model":"claude-opus-4-7",...}

And the piece rmcoppersmith explicitly asked for — a Stop hook that fires at a threshold of your choosing:

$ ctxwatch hook --threshold 0.50 --on-exceed 'your-memory-write-script.sh'
{
  "hooks": {
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "ctxwatch check --threshold=0.5 \"--on-exceed=your-memory-write-script.sh\""
      }]
    }]
  }
}

Drop that block into ~/.claude/settings.json, merge with your existing hooks, done. The check subcommand does the saturation math and exits with code 1 (plus runs your on-exceed command) when you cross the threshold. No escaping gymnastics, no embedded Python one-liners.

Two Design Decisions Worth Sharing

1. Auto-detect 1M users instead of asking them

The [1m] suffix problem is a calibration landmine. The clean fix would be to make users pass --window 1m every time. The nice fix is to detect it.

I noticed something obvious: if I see a single turn in the transcript with more than 200K total tokens, the user is mathematically guaranteed to be on a 1M window. 200K can't fit in 200K. So I added a small pre-scan: if observed max exceeds the default, bump to 1M and tag the source as auto:observed>200K.

def resolve_window(model, override=None, observed_max=0):
    if override:
        return override, "override"
    if model and model.endswith("[1m]"):
        return 1_000_000, "[1m] suffix"
    base = model.rstrip("]").split("[")[0] if model else ""
    for k, v in MODEL_WINDOWS.items():
        if base.startswith(k):
            if v == DEFAULT_WINDOW and observed_max > DEFAULT_WINDOW:
                return 1_000_000, f"auto:{k}+observed>200K"
            return v, k
    return DEFAULT_WINDOW, "default"

Manual override still works (--window 1m, --window 200k, or raw tokens). But 95% of users never touch it.

2. `parse_usage` never raises

A single corrupted JSONL line — partial write, disk full, schema drift — used to kill my whole scan. My code-review caught it late in the day: the int() calls were outside the try/except. A non-numeric input_tokens field (improbable but not impossible) would propagate ValueError through every code path.

Fix: wrap everything, return None on any failure.

def parse_usage(line):
    try:
        d = json.loads(line)
        if d.get("type") != "assistant":
            return None
        # ... int coercion, field access, etc ...
        return TurnUsage(...)
    except (json.JSONDecodeError, ValueError, TypeError, AttributeError):
        return None

Small change. Would have bitten me within days of real-world use.

What's Next (v0.2)

Multi-project dashboard — aggregate across all your Claude Code projects
Hook template library — more patterns than just Stop; common memory-write recipes
Historical trend — "you crossed 80% saturation 12 times this week"

For now: v0.1 ships, today. If you write hooks, build agent memory, or just want to know how close your next session is to compaction — try it and tell me where it's wrong.

Links

Repo: https://github.com/Genie-J/ctxwatch
Issue that inspired this: anthropics/claude-code#49226
Sibling project: cc-healthcheck — static snapshot of what's in your context right now

Install:

pipx install git+https://github.com/Genie-J/ctxwatch

Or clone and run — it's one file. MIT.

Built as part of OPC Team, a self-directed experiment in solo-dev AI infrastructure. Calibration feedback via GitHub issues is the primary signal I'm watching for.

What's eating your Claude Code context window? I wrote a 500-line Python script to find out

Genie — Thu, 16 Apr 2026 03:16:29 +0000

If you use Claude Code seriously — Max plan, 50+ skills, a CLAUDE.md that's grown organically over months — you've probably hit this moment:

You run claude /context and it says your system prompt is sitting at 14% of your context window before you've typed anything. And claude /cost tells you today's spend but doesn't say what inside your setup is expensive.

Tokens are real money. You can't optimize what you can't see. So I wrote cc-healthcheck — a single Python file, zero dependencies, zero network, that reads ~/.claude/ locally and answers three questions:

What auto-loads into every session? (CLAUDE.md + every @-reference + rules/*.md + every skill frontmatter)
Are my hooks broken? (pipe-corruption bugs, missing timeouts, case-sensitivity traps)
Where did the last session's tokens actually go? (per-model totals, cache hit ratio, system-reminder injection count)

Sample output on my own machine:

━━━ cc-healthcheck v0.1.0 ━━━

[1] Auto-Load Budget
    CLAUDE.md chain:       12.0K  (420 lines across 4 file(s))
    rules/*.md (11 files):  7.9K
    skills frontmatter (76): 3.6K  (full bodies: 102.5K — loaded on invocation)
    ───────────────────────────────
    Total auto-loaded:     23.4K  (2.34% of 1M)
    Status: ✅ HEALTHY (soft limit: 100.0K, hard: 200.0K)

[2] Hooks (20 total across 6 events)
    Issues (5):
      ⚠️  [SessionStart] inline '|' without quoting — known Claude Code #1132 corruption risk
      ⚠️  [PreToolUse/Write] no timeout set — hook can hang indefinitely
      ...

[3] Latest Session X-Ray
    Size: 1.11 MB, 365 records (147 assistant turns)
    Cumulative API tokens: 29.4M  (cache_read 90.6% — cache working)
    ⚠️  system-reminder injections: 13 occurrences

How it actually works

The code is ~500 lines of Python stdlib. No tiktoken, no requests, no external anything — just json, pathlib, re, argparse.

Counting tokens without tiktoken

For a health-check tool, exact tokenization is overkill. I use len(text) / 4.0 as the standard English-prose approximation (OpenAI/Anthropic both document this ratio). For JSON/code it drifts to ~3.5, but order-of-magnitude is what matters when you're asking "is my CLAUDE.md eating 3K or 30K?"

def est_tokens(s, ratio=4.0):
    if not s:
        return 0
    return max(1, int(len(s) / ratio))

If you need exact numbers, pipe the --json output into a real tokenizer. I'd rather keep the tool install-free than 10% more accurate.

Following `@` references

Claude Code's CLAUDE.md supports @~/path/to/file.md at the start of a line to include other files in the auto-loaded context. To count the whole tree:

AT_REF_RE = re.compile(r"^@(~[^\s]+|[^\s]+)", re.MULTILINE)

def include(p, via="root"):
    if p in seen or not p.exists():
        return
    seen.add(p)
    text = p.read_text(encoding="utf-8", errors="replace")
    # count tokens, record path
    ...
    for m in AT_REF_RE.finditer(text):
        ref = m.group(1).strip()
        target = resolve_at_ref(ref, p.parent)
        if target:
            include(target, via=f"@ from {p.name}")

The seen set prevents infinite loops if two files @-reference each other (I've seen this happen in real configs).

Linting hooks for known bugs

Claude Code hooks are a JSON structure in ~/.claude/settings.json. Three recurring issues:

Inline | without quoting — tracked as anthropics/claude-code#1132, marked "not planned" for fix. The command string gets split on | before shell parses it, and your hook silently mangles.
No timeout field — hooks can hang indefinitely, freezing your Claude session.
Lowercase matcher with capitalized tool name — matchers are case-sensitive but docs are ambiguous. "edit" won't match Edit.

The linter flags all three:

if isinstance(cmd, str) and "|" in cmd and '"' not in cmd and "'" not in cmd:
    out["issues"].append({
        "severity": "warn",
        "msg": "inline '|' without quoting — known #1132 corruption risk",
    })

X-raying the JSONL session

Claude Code writes every session to ~/.claude/projects/<id>/<uuid>.jsonl. Each assistant turn has this shape:

{
  "type": "assistant",
  "isSidechain": false,
  "message": {
    "model": "claude-opus-4-6",
    "usage": {
      "input_tokens": 3,
      "output_tokens": 27,
      "cache_creation_input_tokens": 59469,
      "cache_read_input_tokens": 11530
    }
  }
}

Sum those fields across all type === "assistant" records (including isSidechain: true subagent calls, which is the bit that /cost misses) and you have the real API spend for that session.

A bonus finding: counting <system-reminder> occurrences in the raw JSONL is a useful metric. On Claude Code 2.1.x, the skill-trigger list gets re-broadcast inside a system-reminder on many user turns. Those blocks are inside the cached prefix so you're only billed once per 5-minute cache window, but they still count against the context window on every turn.

Why bother?

Two recent open issues on the Claude Code repo describe the same symptom:

#46339 — "System prompt token consumption increased ~40-50% between v2.1.92 and v2.1.100 with zero changes to user configuration"
#46917 — "v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens"

Both reporters had to set up HTTP proxies or manual diffs to investigate. cc-healthcheck won't solve the server-side inflation (only Anthropic can), but it lets you separate the two pools: is it your config that grew, or the platform? Without that, it's all vibes.

Install

Zero-install — run straight from GitHub:

curl -sSL https://raw.githubusercontent.com/Genie-J/cc-healthcheck/main/cc_healthcheck.py | python3 -

Or clone + run:

git clone https://github.com/Genie-J/cc-healthcheck
python3 cc-healthcheck/cc_healthcheck.py

Flags:

cc-healthcheck              # text report
cc-healthcheck --json       # JSON for CI
cc-healthcheck --verbose    # per-file breakdown
cc-healthcheck --version

Repo

github.com/Genie-J/cc-healthcheck — MIT. Issues welcome, especially reconciliation cases where cc-healthcheck numbers don't match what /cost or your Anthropic billing shows.

If you like this flavor (small single-file local tools), I also wrote BurnCheck — same philosophy, different problem (predicting whether your weekly Opus cap is about to hit mid-task).

Keep your context tight. Your wallet will thank you.

How I built a privacy-first Claude Code burn-rate analyzer in a single HTML file

Genie — Wed, 15 Apr 2026 17:00:55 +0000

If you're a Claude Code power user on a Max $20 / $100 / $200 plan, you've probably hit a weekly limit mid-task at least once. Anthropic doesn't publish exact caps, the UI doesn't warn you beforehand, and existing tools just show current usage — they don't predict.

So I built BurnCheck: a single-page webapp that reads your ~/.claude/projects/ folder locally and shows:

Projected weekly cost (based on last 7 days)
Whether you'll cross Max $20 / $100 / $200 caps before Sunday
Sessions at risk of hitting the 5-hour interrupt
Concrete model-swap recommendations ("these 14 Opus calls could've run on Sonnet → save $23/week")

Everything runs in your browser. Zero upload, zero account, zero tracking. Open DevTools Network tab — there are no outbound requests after the page loads. The whole thing is one 27 KB HTML file on GitHub Pages.

The design constraint that shaped everything

Claude Code logs contain every prompt you've ever written — source code, API keys occasionally (though you shouldn't, but people do), business ideas, private thoughts. Any tool that asks you to upload those logs to analyze them is dead on arrival for the target audience (privacy-conscious devs). So:

Constraint: nothing can leave the user's machine.

Consequence: no backend, no serverless, no database, no analytics. Just HTML + JS using the browser's File API to read local files on user-gesture click.

// That's it. The whole "upload" flow.
document.getElementById('folderInput').addEventListener('change', async (e) => {
  const files = Array.from(e.target.files).filter(f => f.name.endsWith('.jsonl'));
  const records = [];
  for (const f of files) {
    const text = await f.text();
    for (const line of text.split('\n')) {
      const d = JSON.parse(line);
      if (d.type === 'assistant' && d.message?.usage) {
        records.push({ model: d.message.model, usage: d.message.usage, ts: d.timestamp });
      }
    }
  }
  render(analyze(records));
});

The user picks their ~/.claude/projects/ folder in the native file picker. Chrome's File API gives the page read access to every .jsonl inside — and nothing more. No network, no fetch(), no hidden beacon.

Cost calculation

Each type: assistant record in the JSONL has a usage object:

{
  "input_tokens": 3,
  "cache_creation_input_tokens": 59469,
  "cache_read_input_tokens": 11530,
  "output_tokens": 27
}

Per-model pricing (community-published rates, ±30% uncertain since Anthropic doesn't publish a stable schedule):

const PRICING = {
  opus:   { in: 15.00, out: 75.00, cw: 1.25, cr: 0.10 },
  sonnet: { in:  3.00, out: 15.00, cw: 1.25, cr: 0.10 },
  haiku:  { in:  0.25, out:  1.25, cw: 1.25, cr: 0.10 },
};

const costOf = (model, u) => {
  const p = priceOf(model);
  return (u.input_tokens || 0) / 1e6 * p.in
       + (u.output_tokens || 0) / 1e6 * p.out
       + (u.cache_creation_input_tokens || 0) / 1e6 * p.in * p.cw
       + (u.cache_read_input_tokens || 0) / 1e6 * p.in * p.cr;
};

Surprising finding from my own logs: 99% of my cost came from cache reads. Without cache prompt optimization, long sessions become very expensive very fast. A single multi-hour Claude Code session with a growing context can easily cross $50.

Weekly cap heuristic

Anthropic's Max plans have opaque weekly caps enforced somewhere between "suggested" and "hard throttle." Community reports suggest:

Plan	Approx weekly $-equivalent
Max $20	~$25
Max $100	~$140
Max $200	~$320

These aren't official and shift. BurnCheck lets users override them from their own throttle experience — the values persist in localStorage so the forecast improves as users calibrate.

At current burn rate × 7, we compute ratio = projected_week / cap. If >100%: "OVER by X%". If 80–100%: "tight". If <80%: "comfortable". It's not a rocket science prediction — it's a trust-the-user-can-read-a-bar-chart move. The value isn't the math, it's surfacing information Anthropic's UI never shows you.

The shareable card (viral loop)

The most fun part to build. A <canvas> element that renders your weekly burn as a 1200×630 PNG (Twitter/X OG card dimensions):

Users download + post it. Each share becomes free distribution. My alpha tester (my partner) noticed this instantly — "people love to flex how much they spent on AI this week" — and it turned out to be the most engagement-generating feature by a large margin.

Canvas drawing is old-school pleasant:

ctx.fillStyle = '#FBF3E8';  // linen paper
ctx.fillRect(0, 0, 1200, 630);
ctx.font = '700 260px -apple-system, sans-serif';
ctx.fillStyle = '#2B1810';  // warm near-black
ctx.textAlign = 'center';
ctx.fillText(`$${totalCost.toFixed(0)}`, 600, 360);

No html2canvas, no external libs. 50 lines of draw calls.

Seven design iterations in one day

I wrote the first version in 2 hours. Then I spent the next 8 hours making it look worse, three times, before landing back near v0.1.

Why? Because I kept trying to ape Cal.com, Clay.com, and Linear.app's design systems instead of just shipping a clean utility. Each time I'd read their DESIGN.md-style regulations, take the DNA ("oat background", "hard-edge offset shadows", "multi-layer ring shadows"), and apply it mechanically — "use hard shadow at least 3 places." Result: cramped dashboards that visually screamed "I am pretending to be a marketing site."

My alpha tester's feedback each round:

v0.4 (Cal-inspired grayscale): "too cold"
v0.5 (Clay-inspired warm + hard shadows): "ugly, zero breathing room"
v0.6 (Linear-inspired breathing-first): "worse, not as good as the first version"

Only v0.6 had the right diagnosis — subtract, don't add. But the right aesthetic was v0.1 all along: orange accent + clean off-white background + system fonts + 1px borders. Boring. Functional. Invisible design.

Lesson: design system DNA is grammar, not recipe. Apply it selectively based on whether the product is marketing-site-shaped (Clay, Linear home page) or utility-dashboard-shaped (what BurnCheck actually is). Force-fitting a marketing aesthetic onto a dashboard produces cramped ugliness at worst, cargo-cult polish at best.

What's next

npm CLI — npx github:Genie-J/burncheck runs the same analysis in your terminal, colored output, zero deps (shipped today)
Pro tier (planned) — auto-monitoring daemon, daily email alerts, hosted dashboard. $9/mo, founding users $5/mo for life. Watch the repo to be notified.
Leaderboard — opt-in "biggest burn of the week" flex board. Privacy-preserving (only aggregated stats posted). Network effect around the share-card loop.

Try it

Web: https://genie-j.github.io/burncheck/
CLI: npx github:Genie-J/burncheck
Source: https://github.com/Genie-J/burncheck

Built over a single day. MIT licensed. Not affiliated with Anthropic.

If it saves you from a Thursday-afternoon limit cutoff, open an issue and tell me how. That's the only thanks the project needs.