Forem: AI Dev Hub

How token counters actually work in 2026, and when to trust them

AI Dev Hub — Wed, 29 Apr 2026 17:58:09 +0000

Most "free token counter" tools in your bookmarks are not running the model's tokenizer. They're running a character-ratio estimate and labeling the output "tokens". For OpenAI's GPT family the official tokenizer is open and easy to ship in a browser. For Claude, Gemini, and most others it isn't. Here's what that means for your context-window math.

Up-front disclosure on this one: the tool I link to below is one I built. I got tired of paste-counter-paste-counter loops where the same input produced different numbers, and tired of tools that claim to support every model but quietly use one tokenizer for all of them. Free, client-side, no signup. I'm linking to it because it's what I use, and because I'd rather show you how it works than pitch it.

If you've ever opened three "GPT token counter" tabs and gotten three different numbers, you're not crazy and the tools aren't all wrong. They're doing different things and labeling them the same way. Knowing which is which makes the difference between "this prompt fits" and "the API will reject it at the boundary".

What "tokenization" actually does

A tokenizer takes raw text and splits it into the integer IDs the model actually consumes. Every model family ships its own vocabulary, trained on its own corpus. Same input string yields different token counts because the vocabularies differ.

OpenAI's GPT-4 family uses an encoding called cl100k_base. The newer GPT-4o, GPT-5, o3 and o4 models use o200k_base, a larger vocabulary tuned for multilingual and code-heavy input. Anthropic's Claude family uses its own vocabulary that's published only as a server-side counting endpoint. Google's Gemini family is similar: server-side counting, no public local tokenizer at the time of writing (April 2026).

The rule of thumb people quote, "1 token is about 4 characters of English", is fine for napkin math and wrong by 10 to 20 percent on real input. German tokenizes worse than English because compound words don't fit the English-trained vocabulary. Code with many short identifiers tokenizes better than prose. Emoji are usually 2 to 4 tokens each. JSON with verbose keys tokenizes much worse than minified JSON. If you're sitting near the context window, the rule of thumb will lie to you.

Exact vs estimated, the real divide

Free token counters fall into two camps.

Exact counters ship the model's actual tokenizer in the browser and run it on your input. The numbers match what the API will charge, give or take a token or two. This is feasible only when the tokenizer is published as a runnable library. For OpenAI's GPT and o-series, that library is tiktoken (Python) and gpt-tokenizer (JavaScript). Both are MIT-licensed and small enough to ship client-side.

Estimating counters apply a character-ratio heuristic. They divide the character count by some constant (3.5 to 4.0 depending on the model family) and round up. The number is roughly right on plain English. It can be 10 to 20 percent off on code, JSON, German, mixed scripts, or anything with unusual whitespace. If a counter is fast on a 100,000-character paste regardless of which model you pick, it's almost certainly estimating.

The honest move is to label which is which. Most counters don't.

What the tool I built actually does

Since I'm linking to one of these, I owe you the spec.

aidevhub.io/token-counter uses gpt-tokenizer to compute exact counts for OpenAI's GPT-4, GPT-5, o3, and o4 model names. For every other family (Claude 3.x, Claude 4.x, Gemini, Llama, DeepSeek, Mistral, Grok) it uses a character-ratio estimate calibrated per family. Claude is chars / 3.5. The others are chars / 4.0. The output labels each row as either exact or estimate so you can tell which you're looking at.

This is honest about what's possible. I can't ship Anthropic's tokenizer client-side because it isn't published as a local library. I can't ship Google's either. The choice was either to fake-claim "supports every tokenizer" (the easy lie) or to label estimates as estimates (the harder honesty). Picked the second.

For most context-budget math at 30 to 70 percent of the window, the estimate is close enough. For boundary cases at 95+ percent of the window, you want the actual tokenizer. The next section is how to get certainty when you need it.

How to get certainty when the number matters

If the count matters (you're at the boundary, or you're billing customers per-token), don't trust any browser tool, including mine. Use the model's own counting endpoint or library.

For OpenAI:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
with open("prompt.txt") as f:
    print(len(enc.encode(f.read())))

That's the source of truth. gpt-tokenizer in the browser uses the same encodings (cl100k_base for GPT-4 era, o200k_base for GPT-4o and newer), so a browser-based exact counter and tiktoken should match within a token or two. If they don't, your tiktoken version is probably stale and the model has updated its vocabulary since you last upgraded.

For Claude, Anthropic publishes a server-side counting endpoint accessible via the SDK as client.messages.count_tokens() (or client.beta.messages.count_tokens() depending on SDK version). It costs nothing to call but it does need network and an API key. Returns the exact count the API will charge for that exact messages array including system prompt and tool definitions.

For Gemini, the SDK exposes model.count_tokens() which similarly calls Google's server.

The post-call usage field on every modern API is also authoritative. After your call, the response includes input_tokens and output_tokens as the actual billed counts. If your local count and the API's usage consistently disagree, your local tokenizer is the one that's wrong.

Where token counts and API math diverge

A counter on raw text isn't the full picture for an API call. Three things eat budget that a naive counter doesn't see:

System prompt and tool definitions count. Every modern API includes them in the input total. If you're counting only the user message, you're under-counting.
Message structure adds overhead. Each message in a chat-format request costs a few tokens for the role markers and separators, on top of the content. OpenAI documents this; Anthropic does too. It's small (3 to 6 tokens per message) but at scale it matters.
Output tokens are a separate budget. The 200,000 number you see in Claude's docs is the input window. Output is configured separately. Claude 4 family has a third configurable budget for thinking tokens. Always check the model's docs for the specific split.

A browser counter that gives you a single number against a single model is a useful sanity check, not a complete budget calculation.

The compact summary

Counter type	What it does	Accuracy	When to use
`tiktoken` (Python)	Runs OpenAI's official tokenizer locally	Exact for GPT and o-series	Boundary cases, prod budget math
`gpt-tokenizer` (JS)	Same vocabularies, browser-shippable	Exact for GPT and o-series	Browser tools, paste-and-count UIs
Anthropic `count_tokens`	Server-side API call	Exact for Claude, includes message overhead	When the count matters and you have a key
Gemini `count_tokens`	Server-side API call	Exact for Gemini, includes message overhead	When the count matters and you have a key
Character-ratio estimate	`chars / 3.5` or `chars / 4.0`	Within 10 to 20 percent on most input	Quick sanity check, no key needed

A few small habits that pay off

After watching too many "but my count said it'd fit" boundary failures, three habits I've stuck with:

Count against the actual target model, not "GPT-4 close enough". Different vocabularies give different numbers on identical input. If you're sending to Claude 4.6, count with Anthropic's tokenizer.
Minify JSON before sending. Pretty-printed JSON spends tokens on whitespace. The model doesn't care. Editor reads the indented version, model reads the minified one. Easy to script in your client.
Log token counts on every prod call and graph the average weekly. If your average prompt size starts creeping up because someone added a new few-shot example, you'll see it before it tips over the budget. Costs about 10 lines of code per service.

FAQ

Q: Are there official tokenizers I can run locally for every model?
A: Only OpenAI publishes one as a runnable library (tiktoken in Python, gpt-tokenizer in JS). Anthropic and Google publish counting as server APIs only. If a third-party tool claims to do exact tokenization for Claude or Gemini in your browser, it's almost certainly estimating, no matter what the marketing says.

Q: Why does the count change when I add a system prompt?
A: Because system prompt is part of the input. Same for tool definitions if you're using tool-use APIs. The input window includes the entire request payload, not just the user turn. This trips people who count only their user message.

Q: How accurate is the post-call usage field?
A: It's the source of truth. That's what was billed. Counters before the call are estimates of what usage will say. They should match within 1 to 2 tokens if your local tokenizer matches the model's current version. Consistent drift means your local library is stale.

Q: Does whitespace really matter that much?
A: Yes, on text-heavy input. Repeated newlines and indentation are often single tokens each, but they add up. A pretty-printed 5,000-line JSON file can use noticeably more tokens than the same JSON minified, with no information loss. If you're trimming for budget, that's the first place to look.

Q: What about thinking tokens on Claude 4 and reasoning tokens on o-series?
A: Separate budget on both. Claude 4 family has a configurable thinking token budget independent of input and output. OpenAI's o-series has reasoning tokens that count against output. Check the specific model's docs because the rules vary by version.

Written with AI assistance and human review.

5 cron expression gotchas that catch experienced devs in 2026

AI Dev Hub — Sat, 25 Apr 2026 13:34:21 +0000

Cron is one of those tools where the syntax looks obvious until a job fires at the wrong time and you start digging. Five behaviors below are documented in the man page and still catch people who've been writing cron for years. Each one is in a footnote most tutorials skip.

Quick disclosure on this one: the cron builder I link to below is something I built. After enough years of writing 5-field expressions by hand, I wanted a tool that showed me the next 5 fire times in my actual local timezone before I committed. Free, client-side, no signup. Linking to it because it's the workflow I use now.

I think most devs learn cron the same way. You copy something off Stack Overflow that looks close to what you want, you tweak a number, you commit it, and then a few days later something fires at the wrong time and you start reading the man page properly. The 5 behaviors below are the ones I see trip people up over and over. None are exotic. All are documented. All pass code review.

Gotcha 1: `*/5` is anchored to the field origin, not to "now"

*/5 * * * * does not mean "every 5 minutes from whenever the job loaded". It means "every minute whose value is divisible by 5". So it fires at :00, :05, :10, :15, etc. If you load the job at :07 and expect the next fire 5 minutes later, you'll see the next fire at :10, not :12.

The same rule applies to every field. 0 */6 * * * fires at 00:00, 06:00, 12:00, 18:00, anchored to midnight. Not to whenever you started the scheduler.

This is the right behavior for most use cases (predictable, aligned across machines) but it's not what people often expect on the first read. The lesson: */N is anchored to the field's natural origin, never to the load time.

Gotcha 2: day-of-month and day-of-week are OR, not AND

This one is in the POSIX spec and almost nobody reads it. The expression 0 9 1 * 1 does NOT mean "the 1st of the month, but only if it's a Monday". It means "at 9am on the 1st of every month, OR on every Monday". So it fires roughly 5 times more often than the AND interpretation would suggest.

There's no way to express AND between those two fields in standard POSIX cron. Two common workarounds:

import datetime as dt

# Cron fires every Monday. Script filters down to "first Monday of the month".
now = dt.datetime.now()
if now.weekday() == 0 and now.day <= 7:
    run_billing_job()
else:
    log.info("skipping; not first Monday of the month")

Cron expression becomes 0 9 * * 1 (every Monday at 9am) and the script handles the "first" qualifier. Two pieces of logic, each obvious on its own.

The other workaround is to switch to a scheduler that supports AND between those fields. Quartz syntax (used by AWS EventBridge and many JVM schedulers) treats them as AND when both are non-*. Different platform, different rule. Worth knowing which one you're on.

Gotcha 3: launchd reads local time, not UTC

This is a Mac-specific gotcha and it's caused enough confusion that I now put a comment at the top of every plist. macOS launchd interprets StartCalendarInterval in the system's local timezone. If your plist has Hour=14, the job fires at 14:00 wherever the Mac thinks it is. There is no built-in "interpret as UTC" flag.

If you're migrating a cron job from a Linux server (where cron typically runs in UTC unless configured otherwise) to launchd on a Mac in another timezone, the job will fire at a different absolute time. The expression looks identical. The behavior isn't.

Two ways to fix it on launchd:

Set the system clock to UTC. Works if you control the machine and don't mind the rest of the OS displaying UTC times.
Compute the UTC-equivalent local hour and update it twice a year for daylight saving. Less elegant but doesn't change anything else on the system.

I pick option 2 with a comment in the plist that says "fires at 13:00 UTC; adjust for DST in March and October". Ugly, but explicit, which is what you want when you read the file 6 months later.

Gotcha 4: cron does not catch up missed firings

If your laptop is asleep at the scheduled time, the job does NOT fire on wake. Cron has no built-in catch-up. If your job is "delete files older than 30 days" and the machine is asleep through 3 firings, it just runs once when the next scheduled time arrives. The 3 missed firings are gone.

This is a portable laptop problem more than a server problem. A server that's always on rarely misses. A Mac that sleeps overnight can easily miss its 3am job most nights and never log an error, because there's no error to log. The job didn't fail. It just wasn't fired.

The fix on launchd is StartInterval (interval-based, fires on wake) instead of StartCalendarInterval (clock-time, no catch-up). Or you use a tool with persistent scheduling that does catch up: anacron is the classic Linux answer, cronie with crond -P works similarly, and various job runners (systemd timers with Persistent=true, etc.) handle this natively.

I default to interval-based scheduling for anything maintenance-shaped (backups, cleanup, log rotation) where the exact time matters less than "did it run today". Calendar-based scheduling for anything time-sensitive (a daily 9am email) where running at 11am after the laptop wakes would be wrong.

Gotcha 5: a cron expression has no timezone embedded in it

This is the one that bites distributed teams. The expression 0 9 * * * says "at 9:00 in whatever timezone the scheduler runs in". It doesn't say UTC. It doesn't say Berlin. It says "whatever the scheduler thinks 9:00 is".

If you write the expression in Berlin, deploy the code to a server in US-East, and that server's cron runs in UTC, your job fires at 9:00 UTC, which is 10:00 or 11:00 Berlin time depending on the season. The expression looks fine in code review. The behavior is wrong.

A few things help:

For Linux cron, CRON_TZ=Europe/Berlin at the top of the crontab file pins all subsequent entries to that zone. Documented in man 5 crontab. Easy to miss.
For Quartz-based schedulers, the timezone is usually a separate config field (timeZone in Spring's @Scheduled, for example).
For launchd, you compute it yourself or set the system clock.

I add a comment to every cron entry now that says what timezone I expect it to fire in. Adds 3 seconds to writing the entry and saves the timezone-archaeology session that always comes a month later.

How I'd write each of these now

For reference, here's how each gotcha translates to a defensible expression.

Goal	Naive attempt	What it actually does	Defensible version
Every 5 minutes from now	`/5 * * *`	Fires at :00, :05, :10...	Same expression, accept the alignment
First Monday of month at 9am	`0 9 1 * 1`	1st of month OR every Monday at 9am	`0 9 * * 1` plus script-side date check
14:00 UTC daily on launchd	`Hour=14` in plist	14:00 in local timezone, not UTC	Compute local hour, comment with intended zone
Daily backup at 3am	`0 3 * * *` cron OR `Hour=3` plist	Skips firings when machine is asleep	`StartInterval=86400` or use a catch-up scheduler
Anything moderately complex	Hand-typed	Often wrong on the first try	Build visually, paste, comment what it fires on

When raw cron is still fine

I'm not saying never write cron by hand. For "every minute" (* * * * *) or "every hour at the top" (0 * * * *) it's faster to just type it. The break point for me is anything with more than one non-* field. Two fields with values is where my error rate spikes and the cost of building visually is zero.

Worth knowing: most cron implementations support extensions that aren't in POSIX. @daily, @weekly, @reboot, @hourly all exist in Vixie cron and read better than the equivalent expressions. If your environment supports them, prefer them. They're more readable to whoever opens the file in 2027.

The free cron builder I made and use regularly now is at aidevhub.io/cron-builder. Pick days, hours, minutes from dropdowns, get the expression, see the next 5 fire times in your local timezone. The next-fire preview is the part I find most useful, because it catches the "this expression doesn't actually fire when I think it does" cases before they ship.

FAQ

Q: Why is the day-of-week / day-of-month thing an OR?
A: It's a POSIX thing, dating back to the original Unix cron. The spec says if either field is restricted (not *), they're OR-ed together. There's a footnote in man 5 crontab if you want to read it. Most cron tutorials skip this part because it's a footgun.

Q: Does this work for AWS EventBridge cron expressions?
A: EventBridge uses a 6-field cron syntax with year, and the day-of-week / day-of-month rule is AND there, not OR. So if you're going EventBridge, that specific gotcha goes away. The other 4 still apply. EventBridge also requires you to use ? in one of the two day fields, which is its own kind of footgun.

Q: Is there a cron syntax that's better than the 5-field one?
A: Quartz scheduler's syntax is more expressive (seconds, year, AND between day fields). Most Linux distros ship systemd.timer which is way more readable but is its own thing. Pick whatever your platform supports best. I find systemd timers the cleanest for new Linux work and stick with launchd for Mac because the alternatives aren't worth the friction.

Q: How do I test a cron expression without waiting?
A: Easiest path is a builder that shows the next 5 fire times so you can eyeball whether the schedule matches your intent. Beyond that, croniter for Python and cron-parser for Node both let you iterate the next N firings programmatically. I write a one-line script when I'm not sure: python3 -c "from croniter import croniter; from datetime import datetime; c=croniter('0 9 * * 1'); [print(c.get_next(datetime)) for _ in range(5)]". If the printed times look right, the expression is right.

Q: What about Quartz cron expressions?
A: Different beast. 6 or 7 fields (seconds optional, year optional), ? placeholder for day fields, L for last, # for nth-day-of-month. More expressive, less portable. If you're on a Quartz-based stack you're already in a different syntax and most of the POSIX gotchas above don't apply.