<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Michael Liu</title>
    <description>The latest articles on Forem by Michael Liu (@voqusa).</description>
    <link>https://forem.com/voqusa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3916315%2F74f6e881-8106-4bed-b2c0-248856a6b767.jpg</url>
      <title>Forem: Michael Liu</title>
      <link>https://forem.com/voqusa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/voqusa"/>
    <language>en</language>
    <item>
      <title>OCR is back: replacing Tesseract with PP-OCRv5 in my document pipelines</title>
      <dc:creator>Michael Liu</dc:creator>
      <pubDate>Fri, 08 May 2026 14:23:58 +0000</pubDate>
      <link>https://forem.com/voqusa/ocr-is-back-replacing-tesseract-with-pp-ocrv5-in-my-document-pipelines-15og</link>
      <guid>https://forem.com/voqusa/ocr-is-back-replacing-tesseract-with-pp-ocrv5-in-my-document-pipelines-15og</guid>
      <description>&lt;h2&gt;
  
  
  OCR is back: how I'm replacing Tesseract with PP-OCRv5 in my pipelines
&lt;/h2&gt;

&lt;p&gt;I've been wrangling OCR pipelines for years — Tesseract for plain text, Google Vision when CJK comes up, AWS Textract for tables. Each has its own pain (Tesseract drops handwritten characters, Vision is pricey at scale, Textract's bbox layout is opinionated).&lt;/p&gt;

&lt;p&gt;Recently I've been quietly piping a lot of work through &lt;a href="https://scanread.ai" rel="noopener noreferrer"&gt;ScanRead.ai&lt;/a&gt; instead. It's a free OCR tool built on &lt;strong&gt;PP-OCRv5&lt;/strong&gt; and the new &lt;strong&gt;PaddleOCR-VL&lt;/strong&gt; model. Here's what changed for me.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it actually does
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Image → text in 100+ languages (including Arabic, Japanese, Chinese, Hindi, Thai)&lt;/li&gt;
&lt;li&gt;22 specialized tools: image-to-text, PDF-to-Word, screenshot-to-text, handwriting recognition, math-to-LaTeX, receipt OCR&lt;/li&gt;
&lt;li&gt;Outputs to .txt, .md, or .docx — Markdown export is great for pipelines into Notion or Obsidian&lt;/li&gt;
&lt;li&gt;Free tier is generous: 20 pages/day, no signup&lt;/li&gt;
&lt;li&gt;Pro is $10/mo for 3,000 pages with batch (up to 20 files at once)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where it shined for me
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Handwritten meeting notes.&lt;/strong&gt; Tesseract gives me garbage on cursive. ScanRead reconstructed three pages of a colleague's whiteboard photos with maybe two errors per page. That's the difference between "useful" and "I'll just retype it."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CJK receipts.&lt;/strong&gt; I had a folder of Japanese receipts to reconcile. PaddleOCR-VL handles vertical text and mixed kanji/kana way better than I expected — competitive with Google Vision in my spot-check, at zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Math → LaTeX.&lt;/strong&gt; Pasting screenshots of equations from PDFs and getting back ( \LaTeX ) source is the kind of small thing that saves a real amount of time over a week.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where it's weaker
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Layout reconstruction for complex multi-column PDFs is okay but Textract is still better for forms with deep nested tables.&lt;/li&gt;
&lt;li&gt;The free tier is rate-limited per day, not per minute — fine for humans, awkward for batch jobs.&lt;/li&gt;
&lt;li&gt;No public API yet (as of writing); Pro batch UI is the workaround.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why I'm sharing
&lt;/h3&gt;

&lt;p&gt;If you're paying for Vision/Textract for occasional OCR, try the free tier first. If you do batch scans, the $10/mo Pro plan undercuts both. Link: &lt;a href="https://scanread.ai" rel="noopener noreferrer"&gt;https://scanread.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious if anyone else has switched off Tesseract for handwriting. What's your stack?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>machinelearning</category>
      <category>tooling</category>
    </item>
    <item>
      <title>How I Turn TikTok Videos into Searchable Transcripts in Seconds (Free Tool)</title>
      <dc:creator>Michael Liu</dc:creator>
      <pubDate>Wed, 06 May 2026 16:14:45 +0000</pubDate>
      <link>https://forem.com/voqusa/how-i-turn-tiktok-videos-into-searchable-transcripts-in-seconds-free-tool-h6</link>
      <guid>https://forem.com/voqusa/how-i-turn-tiktok-videos-into-searchable-transcripts-in-seconds-free-tool-h6</guid>
      <description>&lt;h2&gt;
  
  
  Why I needed transcripts
&lt;/h2&gt;

&lt;p&gt;I spend a lot of time studying short-form video — TikTok hooks, YouTube Shorts, Instagram Reels — and the part I actually want is the &lt;strong&gt;script&lt;/strong&gt;, not the video. Re-watching to copy down a 30-second hook is painful, and most "free transcript tools" hide behind a signup wall or only work on YouTube.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;&lt;a href="https://www.voqusa.com" rel="noopener noreferrer"&gt;Voqusa&lt;/a&gt;&lt;/strong&gt; — paste a TikTok / YouTube / Instagram / Facebook / Twitter / LinkedIn / Pinterest URL, get the transcript instantly. No signup, no paywall on captions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Paste the video URL.&lt;/li&gt;
&lt;li&gt;Voqusa pulls the audio + any embedded captions.&lt;/li&gt;
&lt;li&gt;AI speech-to-text fills in the rest (14 languages supported).&lt;/li&gt;
&lt;li&gt;Copy the text and search/repurpose/study it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A few things I made deliberate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No account required for caption-based transcripts.&lt;/strong&gt; You only spend a credit when the AI has to do speech-to-text from scratch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failed transcripts cost 0 credits.&lt;/strong&gt; If we can't pull it, you don't pay.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy:&lt;/strong&gt; URLs and transcripts aren't kept after your session ends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I use it for
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Reverse-engineering viral hooks (collect 50 transcripts, find patterns)&lt;/li&gt;
&lt;li&gt;Building swipe files of proven video structures&lt;/li&gt;
&lt;li&gt;Summarizing podcast clips into LinkedIn posts&lt;/li&gt;
&lt;li&gt;Accessibility — adding text alternatives to video content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If you ever wanted "Ctrl+F for video," it's at &lt;strong&gt;&lt;a href="https://www.voqusa.com" rel="noopener noreferrer"&gt;voqusa.com&lt;/a&gt;&lt;/strong&gt;. Captions are free; speech-to-text is pay-as-you-go (no subscription, credits valid 12 months). Curious if anyone has other use cases — drop them in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>tools</category>
    </item>
  </channel>
</rss>
