OCR is back: replacing Tesseract with PP-OCRv5 in my document pipelines

Michael Liu — Fri, 08 May 2026 14:23:58 +0000

OCR is back: how I'm replacing Tesseract with PP-OCRv5 in my pipelines

I've been wrangling OCR pipelines for years — Tesseract for plain text, Google Vision when CJK comes up, AWS Textract for tables. Each has its own pain (Tesseract drops handwritten characters, Vision is pricey at scale, Textract's bbox layout is opinionated).

Recently I've been quietly piping a lot of work through ScanRead.ai instead. It's a free OCR tool built on PP-OCRv5 and the new PaddleOCR-VL model. Here's what changed for me.

What it actually does

Image → text in 100+ languages (including Arabic, Japanese, Chinese, Hindi, Thai)
22 specialized tools: image-to-text, PDF-to-Word, screenshot-to-text, handwriting recognition, math-to-LaTeX, receipt OCR
Outputs to .txt, .md, or .docx — Markdown export is great for pipelines into Notion or Obsidian
Free tier is generous: 20 pages/day, no signup
Pro is $10/mo for 3,000 pages with batch (up to 20 files at once)

Where it shined for me

Handwritten meeting notes. Tesseract gives me garbage on cursive. ScanRead reconstructed three pages of a colleague's whiteboard photos with maybe two errors per page. That's the difference between "useful" and "I'll just retype it."

CJK receipts. I had a folder of Japanese receipts to reconcile. PaddleOCR-VL handles vertical text and mixed kanji/kana way better than I expected — competitive with Google Vision in my spot-check, at zero cost.

Math → LaTeX. Pasting screenshots of equations from PDFs and getting back ( \LaTeX ) source is the kind of small thing that saves a real amount of time over a week.

Where it's weaker

Layout reconstruction for complex multi-column PDFs is okay but Textract is still better for forms with deep nested tables.
The free tier is rate-limited per day, not per minute — fine for humans, awkward for batch jobs.
No public API yet (as of writing); Pro batch UI is the workaround.

Why I'm sharing

If you're paying for Vision/Textract for occasional OCR, try the free tier first. If you do batch scans, the $10/mo Pro plan undercuts both. Link: https://scanread.ai

Curious if anyone else has switched off Tesseract for handwriting. What's your stack?

How I Turn TikTok Videos into Searchable Transcripts in Seconds (Free Tool)

Michael Liu — Wed, 06 May 2026 16:14:45 +0000

Why I needed transcripts

I spend a lot of time studying short-form video — TikTok hooks, YouTube Shorts, Instagram Reels — and the part I actually want is the script, not the video. Re-watching to copy down a 30-second hook is painful, and most "free transcript tools" hide behind a signup wall or only work on YouTube.

So I built Voqusa — paste a TikTok / YouTube / Instagram / Facebook / Twitter / LinkedIn / Pinterest URL, get the transcript instantly. No signup, no paywall on captions.

How it works

Paste the video URL.
Voqusa pulls the audio + any embedded captions.
AI speech-to-text fills in the rest (14 languages supported).
Copy the text and search/repurpose/study it.

A few things I made deliberate:

No account required for caption-based transcripts. You only spend a credit when the AI has to do speech-to-text from scratch.
Failed transcripts cost 0 credits. If we can't pull it, you don't pay.
Privacy: URLs and transcripts aren't kept after your session ends.

What I use it for

Reverse-engineering viral hooks (collect 50 transcripts, find patterns)
Building swipe files of proven video structures
Summarizing podcast clips into LinkedIn posts
Accessibility — adding text alternatives to video content

Try it

If you ever wanted "Ctrl+F for video," it's at voqusa.com. Captions are free; speech-to-text is pay-as-you-go (no subscription, credits valid 12 months). Curious if anyone has other use cases — drop them in the comments.

Forem: Michael Liu