Forem: Vin Xu

How CastReader Cracks Kindle's Font Encryption for Text-to-Speech

Vin Xu — Tue, 31 Mar 2026 04:54:26 +0000

CastReader decodes Kindle Cloud Reader's text by intercepting Amazon's font subset data and mapping scrambled glyph codes back to real characters using local OCR calibration. It's the only Chrome extension that can do this — every other TTS tool reads gibberish because Amazon's custom fonts make the DOM text unreadable. Here's exactly how it works.

The Problem Every TTS Extension Hits
Open any book on read.amazon.com. Right-click. "Inspect Element." Look at the text in the DOM.

You won't find any.

Kindle Cloud Reader doesn't render text the way a normal website does. There are no

tags with readable sentences. No elements with words in them. Instead, Amazon's renderer delivers the entire page as a pre-rendered blob image — a single tag pointing to a blob: URL. The page you're reading is, from the browser's perspective, a picture.

But it gets worse. Amazon also sends structured data alongside that image: glyph positions, font metrics, paragraph boundaries. This data uses custom font subsets where Unicode codepoints are remapped to arbitrary glyph IDs. The character "T" might be glyph 847. The letter "h" might be glyph 203. The letter "e" might be glyph 1,044. These numbers change per book. They can even change per batch of pages within the same book.

When Read Aloud, NaturalReader, or Speechify try to extract text from this page, they find either nothing (it's an image) or scrambled glyph codes that produce nonsense when fed to a TTS engine. This isn't a bug in those extensions. They're architecturally incapable of solving this problem.

How Amazon's Font Scrambling Actually Works
Amazon's Kindle renderer operates through a /renderer/render API that returns a TAR archive for each batch of pages. Inside that archive:

tokens_X_Y.json — paragraph boundaries and word bounding boxes, each identified by a positionId
page_data_X_Y.json — the actual glyph sequences, font references, and 2D transforms for positioning each character
glyphs.json — SVG path definitions for every glyph in the font subset (~93KB of vector data)
The key structure is the "run" — a sequence of glyph IDs that represents a chunk of text. Each run looks something like this:

{
"glyphs": [847, 203, 1044, 92, 847, 203, 1044],
"fontFamily": "amzn-mobi-KindleBookerly",
"elementId": "934",
"xPosition": [59.6, 67.2, 73.8, 80.1, 86.4, 93.0, 99.6]
}
Those glyph IDs — 847, 203, 1044 — are not Unicode. They're indices into the custom font subset delivered in glyphs.json. The font file knows how to draw glyph 847 as the letter "T," but that mapping exists only inside the font. There's a strict 1:1 relationship: one positionId corresponds to exactly one glyph.

Amazon refreshes these font subsets across render cycles. Navigate forward 18 pages and a new batch arrives — potentially with a completely different glyph mapping. Glyph 847 might now be "S" instead of "T." This means any decoder that caches mappings from the first batch will produce wrong text on later pages.

The scheme is elegant DRM. The browser renders the page correctly because it has the font file. But anything trying to read the underlying data programmatically gets meaningless numbers.

CastReader's Four-Step Decode Pipeline
CastReader solves this with a pipeline that runs entirely in your browser. No cloud processing. No API costs for decoding. Four coordinated components work across Chrome's execution contexts.

Step 1: Intercept the Render Data
A main-world content script (kindle-intercept.content.ts) runs at document_start — before Amazon's own code loads. It intercepts responses from the /renderer/render API, parses the TAR archive, and extracts the token data, page data, and glyph definitions.

This happens transparently. The interceptor doesn't block or modify Amazon's rendering. It just copies the data as it flows through, accumulating pages across all three batches that Amazon sends per render cycle (current pages, backward prefetch, forward prefetch — roughly 18 pages total).

The extracted data gets passed to an isolated-world content script via DOM attributes. Two Chrome execution contexts, cooperating through the only bridge available to them.

Step 2: Build the Glyph-to-Visual Mapping
The glyph mapper (kindle-glyph-mapper.ts) takes the raw glyph SVG paths and renders each one onto a small canvas. This produces a visual representation of every glyph in the current font subset — what each glyph ID actually looks like when drawn.

But a picture of a letter isn't the same as knowing which letter it is. Glyph 847 renders as something that looks like "T" — but the mapper needs to confirm that programmatically. That's where OCR comes in.

Step 3: OCR Calibration (Not OCR Reading)
This distinction is critical: CastReader does not use OCR to read the book. OCR is used only for calibration — to build a mapping table between glyph IDs and real characters.

Here's how it works. CastReader captures the blob image of the current page and sends it to Tesseract.js, which runs locally in a Chrome offscreen document. Tesseract reads the image and produces recognized text. CastReader then aligns the OCR output against the known glyph sequences from the token data.

The alignment uses position matching. Each glyph has precise x/y coordinates from the page data. Each OCR character has a bounding box. By matching positions, CastReader builds a confidence-scored mapping: glyph 847 at position (59.6, 142.3) corresponds to OCR character "T" at roughly the same coordinates. Do this across hundreds of glyphs on a page and you get a complete decode table.

The space character gets special treatment. Spaces are encoded as glyphs within runs (not as gaps between runs), and the space glyph is identified as the most frequently occurring glyph on the page — a statistical shortcut that's reliable across every book tested.

Why not just use OCR for everything? Three reasons:

Accuracy. OCR makes mistakes, especially with unusual fonts or small text. Glyph decoding, once calibrated, is exact.
Word-level highlighting. CastReader highlights individual words as they're spoken. This requires precise character-level text that matches the token positions. OCR text doesn't align cleanly enough.
Speed. OCR is slow. Glyph decoding after calibration is instant — a simple table lookup per character.
The calibration runs once per render cycle. When the user turns enough pages to trigger a new batch with a different font subset, CastReader automatically re-calibrates. Every time you click "Read Page," fresh OCR runs to ensure the decode table matches the current font.

Step 4: Decode and Extract Paragraphs
With the mapping table built, decoding is straightforward. Walk through each token block (which maps 1:1 to a semantic paragraph), look up each glyph ID in the decode table, concatenate the characters, and output clean text.

The token data also provides exact bounding boxes for every word and paragraph. CastReader uses these to create a DOM overlay with positioned

elements for each paragraph — enabling the same click-to-jump and paragraph highlighting that works on regular websites.

For dual-column layouts (common in Kindle), the system detects column structure from the token positions and orders paragraphs correctly: left column top-to-bottom, then right column top-to-bottom. The layout detection is purely data-driven — derived from the x-coordinates of token blocks, not from heuristics about page width.

Why Nobody Else Has Built This
The engineering complexity is substantial. You need:

A main-world content script that intercepts fetch responses without breaking Amazon's rendering
TAR archive parsing in the browser
Cross-context communication between main world and isolated world scripts
An offscreen document running Tesseract.js for OCR
Position-based alignment between OCR output and glyph sequences
Adaptive re-calibration when font subsets change across batches
Dual-column detection and correct reading order
Word-level bounding boxes for highlight synchronization
Each of these is a non-trivial problem. Together, they form a system that took months of reverse engineering Amazon's render pipeline to get right. The per-batch font rotation alone — where glyph mappings change every 18 pages — eliminates any approach based on a static lookup table.

And all of this runs locally. No book data leaves your browser for decoding. The only network call is sending the final clean text to the TTS voice API.

How to Use It
Three steps. Seriously.

Install CastReader from the Chrome Web Store. Also available on Edge Add-ons. No account required.
Open a book at read.amazon.com.
Click the CastReader icon. The extension extracts text from the current pages, starts reading with a natural AI voice, and highlights each paragraph as it goes. Click any paragraph to jump to it. Use the floating player to pause, adjust speed, or skip ahead.

The first page takes a few seconds longer than usual — that's the OCR calibration running. Subsequent pages decode instantly from the cached mapping.

For a quick-start guide, see our Listen to Kindle page. And if you're interested in how CastReader handles regular websites (no font tricks needed), check out our overview of free text-to-speech tools.

The Bottom Line
Amazon built font subset scrambling to protect book content from copying. It's effective DRM — it stops every generic text extraction tool cold. But it also blocks accessibility tools, screen readers, and TTS extensions that people rely on to consume content.

CastReader bridges that gap with a local decode pipeline that respects the DRM boundary (it reads what's visually on screen, nothing more) while making the text accessible to speech synthesis. Zero cloud cost for decoding. Zero data exfiltration. Just glyph math and a bit of OCR, running in your browser.

Try CastReader free — it works on Kindle Cloud Reader and 99% of other websites too.

I Cancelled My Audible Subscription After Finding 50,000 Free Audiobooks Nobody Talks About

Vin Xu — Sun, 22 Mar 2026 13:45:24 +0000

I was paying Audible $14.95 a month for one credit. One book. If I wanted two books that month — tough luck, buy another credit for $12. I did this for three years. That's over $500 for maybe 40 books. I'm a software engineer. I should've

optimized this sooner.

The breaking point was a Thursday night in February. I'd just burned my monthly credit on a mass-market thriller that turned out to be terrible. DNF by chapter three. No refund. I remember sitting there thinking — most of the books I actually
want to read are old. Like, really old. Dostoevsky. Orwell. Hemingway. Kafka. These are public domain. Why am I paying for them?

I started digging.

LibriVox was the first thing I found. Volunteer-narrated audiobooks. Free. Massive catalog. I downloaded their recording of Crime and Punishment and lasted about eight minutes. The narrator was someone's uncle reading into a USB microphone in
what sounded like a bathroom. Every chapter had a different volunteer with a different accent and recording setup. Chapter five was literally whispering. I respect the project — thousands of volunteers giving their time is beautiful. But

listening to it? Not beautiful.

Then I found CastReader kind of by accident. I was actually looking for a text-to-speech extension to read Hacker News comments aloud (don't judge me, long commute), and the extension had this "Library" section. I clicked it expecting maybe

200 curated books. There were 50,000.

Fifty thousand.

I opened The Great Gatsby. Hit play. And — okay, it's an AI voice, not Morgan Freeman. But it was consistent. Clean audio. Proper pacing. No bathroom echo. No jarring narrator switch between chapters. It just read the book, paragraph by

paragraph, with this highlight following along on screen so I could glance at the text whenever I wanted.

I listened to the entire thing on a Saturday afternoon while doing laundry and cleaning the kitchen. Two and a half hours. Free. No credit. No subscription.

That week I went a little nuts. Frankenstein on Monday's commute. The first three chapters of Moby Dick on Tuesday (I'll finish it eventually. Probably). Metamorphosis on Wednesday — which is only like 90 minutes and genuinely better as audio
because Kafka's sentences have this rhythm that you miss when you're reading silently. Thursday I started Dracula, which is structured as diary entries and letters so it works incredibly well in audio format.

My Audible app sent me a notification. "You have 1 unused credit." I stared at it for a while.

Here's the thing about public domain books that people forget. Everything published before 1929 is free. Not free as in "free trial." Free as in nobody owns it anymore. That includes basically all of the Western literary canon. Jane Austen.

Mark Twain. Oscar Wilde. Edgar Allan Poe. Arthur Conan Doyle. H.P. Lovecraft. All of Shakespeare. All of Dickens. Homer. Dante. Tolstoy. Chekhov. The Brontë sisters. I could keep going. The point is — if the author has a Wikipedia page and
died more than 75 years ago, you can probably listen to their entire bibliography for free.

CastReader's library is basically Project Gutenberg's catalog with AI narration layered on top. You can browse by genre, by author, filter by rating. The ratings come from Goodreads so you can actually find the good stuff instead of scrolling
through 19th century agricultural pamphlets (there are a surprising number of those in Project Gutenberg).

I showed this to my teammate during standup. Bad idea. We lost fifteen minutes of sprint planning to everyone browsing the catalog on their phones. My tech lead found a collection of Sherlock Holmes stories and I haven't seen his Airpods out
since. Our PM — and I quote — said "wait, The Picture of Dorian Gray is free? I was about to buy this on Audible." That would've been $11.

Some things I learned after a month of this.

The AI voice handles non-fiction better than fiction. Factual prose, essays, philosophy — it's great. The voice stays neutral and clear, which is what you want. For dialogue-heavy fiction, you notice it's one voice doing everything. It

doesn't do character voices. If you're reading a novel with lots of "he said, she said," you sometimes lose track of who's speaking. For something like Marcus Aurelius's Meditations or Thoreau's Walden? Perfect medium.

The browser-based approach is actually an advantage, not a limitation. I initially thought "ugh, I need Chrome open." But it means I can listen on any device with a browser. My work laptop. My personal MacBook. My wife's iPad. No app to

install, no syncing accounts, no wondering which device has my progress. And there's a send-to-phone feature for when I want to keep listening on a walk.

I still have Audible. I use it for new releases — stuff that came out this year, contemporary fiction, that one fantasy series everyone on Reddit keeps recommending. But for classics? I cancelled the autopay and switched to buying credits
individually when I actually need them. Went from $14.95/month to maybe $12 every two or three months.

My colleague asked me last week what I've been reading lately. I rattled off six titles. She looked at me like I'd grown a second head. "When do you have time for all that?"

I don't. I listen while I'm cooking. While I'm on the subway. While I'm waiting for CI to pass. While I'm doing the dishes. Twelve minutes here, twenty minutes there. It adds up fast when the content is free and unlimited and you don't have

to agonize over whether this particular book is worth spending a credit on.

The library is at castreader.ai/books if you want to browse. The Chrome extension is at castreader.ai. Both free. I'm not affiliated with them — I just cancelled a $180/year subscription because of them and figured other people might want to
do the same.

Now if you'll excuse me, I have 200 pages of War and Peace queued up and a mass deployment to babysit.