<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Madhav Mallya</title>
    <description>The latest articles on Forem by Madhav Mallya (@madhav_mallya_37a1bc55e50).</description>
    <link>https://forem.com/madhav_mallya_37a1bc55e50</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3921284%2F234bc1d2-7516-41ee-8b2c-2eba2cd6e72a.webp</url>
      <title>Forem: Madhav Mallya</title>
      <link>https://forem.com/madhav_mallya_37a1bc55e50</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/madhav_mallya_37a1bc55e50"/>
    <language>en</language>
    <item>
      <title>Building a 100% Client-Side PDF Toolkit with WebAssembly: Lessons from 70+ Tools and 2k Weekly Users</title>
      <dc:creator>Madhav Mallya</dc:creator>
      <pubDate>Sat, 09 May 2026 07:45:34 +0000</pubDate>
      <link>https://forem.com/madhav_mallya_37a1bc55e50/building-a-100-client-side-pdf-toolkit-with-webassembly-lessons-from-70-tools-and-2k-weekly-users-1cma</link>
      <guid>https://forem.com/madhav_mallya_37a1bc55e50/building-a-100-client-side-pdf-toolkit-with-webassembly-lessons-from-70-tools-and-2k-weekly-users-1cma</guid>
      <description>&lt;p&gt;&lt;a href="https://exactpdf.com/blog/client-side-pdf-toolkit-wasm" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few months ago I uploaded a payslip to a "free PDF compressor" to fit it under an IRS portal's 2 MB limit. Then I read the privacy policy and saw they retained uploads for "service improvement" indefinitely. That was the moment I decided every PDF tool I'd build going forward would run &lt;strong&gt;entirely in the user's browser&lt;/strong&gt; — no upload, no backend, no server touching the file at any point.&lt;/p&gt;

&lt;p&gt;A few months later that became &lt;a href="https://exactpdf.com" rel="noopener noreferrer"&gt;ExactPDF&lt;/a&gt; — 70+ PDF tools, ~1,300 weekly users, ~80% organic traffic, infrastructure under $40/month. This post is the architecture and the things that broke along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The contract: open DevTools, prove it
&lt;/h2&gt;

&lt;p&gt;The whole differentiator hinges on a single verifiable claim: &lt;strong&gt;your file never leaves your machine.&lt;/strong&gt; Anyone can prove it — open DevTools, Network tab, process a 100 MB PDF, observe zero outbound requests with the file payload. The competitors on Google's first page (you know the ones) cannot pass this test. That's the moat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;The actual processing layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pdf-lib.js.org/" rel="noopener noreferrer"&gt;pdf-lib&lt;/a&gt;&lt;/strong&gt; for manipulation — merge, split, rotate, compress, watermark, page reordering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://mozilla.github.io/pdf.js/" rel="noopener noreferrer"&gt;PDF.js&lt;/a&gt;&lt;/strong&gt; for rendering — thumbnails, page extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tesseract.projectnaptha.com/" rel="noopener noreferrer"&gt;Tesseract.js&lt;/a&gt;&lt;/strong&gt; in a Web Worker for OCR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://huggingface.co/docs/transformers.js" rel="noopener noreferrer"&gt;Transformers.js&lt;/a&gt;&lt;/strong&gt; with a DistilBERT QA model for chat-with-pdf&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ffmpegwasm.netlify.app/" rel="noopener noreferrer"&gt;FFmpeg.wasm&lt;/a&gt;&lt;/strong&gt; for the read-aloud feature (text → MP3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shell: Next.js 14 App Router, TypeScript, MUI v5, deployed on Cloud Run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three problems that ate a week each
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tesseract.js froze the UI on every long document
&lt;/h3&gt;

&lt;p&gt;The naive integration runs OCR on the main thread. On a 200-page scan, the whole tab locks up for 60+ seconds, scroll jitters, the user assumes it's broken and refreshes.&lt;/p&gt;

&lt;p&gt;Fix: spin up Tesseract in a &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API" rel="noopener noreferrer"&gt;Web Worker&lt;/a&gt; and post the page buffer in. Main thread stays responsive, progress updates stream back, the &lt;a href="https://exactpdf.com/tools/ocr-pdf" rel="noopener noreferrer"&gt;OCR tool&lt;/a&gt; handles 500-page scans without breaking.&lt;/p&gt;

&lt;p&gt;The only gotcha: on mobile (Pixel 6 in my testing) Tesseract's WASM heap blows past Chrome's per-tab limit around 60-80 pages. Workaround is chunking — process 25 pages at a time, free buffers between batches.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. PDF.js loading via ES module silently failed on iOS Safari
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Don't do this:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;script&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;module&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://cdnjs.cloudflare.com/ajax/libs/pdf.js/...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;iOS Safari's strict-MIME-type enforcement on ES modules killed this for ~15% of users. No console error, just a silent fail to render thumbnails on the &lt;a href="https://exactpdf.com/tools/pdf-merge" rel="noopener noreferrer"&gt;merge tool&lt;/a&gt; and &lt;a href="https://exactpdf.com/tools/pdf-split" rel="noopener noreferrer"&gt;split tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Fix: load the standard (non-module) UMD build from cdnjs, no &lt;code&gt;type="module"&lt;/code&gt;. Works on every browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. SharedArrayBuffer needs COOP + COEP, which breaks third-party scripts
&lt;/h3&gt;

&lt;p&gt;FFmpeg.wasm requires &lt;code&gt;SharedArrayBuffer&lt;/code&gt;, which since &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/crossOriginIsolated" rel="noopener noreferrer"&gt;Spectre/Meltdown&lt;/a&gt; needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set those globally and Razorpay Checkout, Google Analytics, and a dozen other third-party scripts break because they fail the COEP check.&lt;/p&gt;

&lt;p&gt;Fix: scope the headers to the single route that needs them. Next.js makes this clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/(|[a-z]{2}/)tools/pdf-read-aloud&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cross-Origin-Opener-Policy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;same-origin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cross-Origin-Embedder-Policy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;require-corp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everywhere else still loads ads/analytics/Razorpay normally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The expensive SEO lesson
&lt;/h2&gt;

&lt;p&gt;In May I rolled out 8 high-traffic tools across 9 locales using thin, partial translations from English. Within 8 days, GSC's indexed-pages count dropped from 477 → 345 (a 28% loss). Google had identified the locale variants as duplicates because the translated metadata only changed about 14% of the 8-word shingles relative to canonical English — well below Google's "this is a different document" threshold.&lt;/p&gt;

&lt;p&gt;Lesson: &lt;strong&gt;don't ship localized URLs into the sitemap until message-file parity is real.&lt;/strong&gt; I have a translation-gap reporter now (&lt;code&gt;npm run i18n:gap&lt;/code&gt;) that fails the build if a locale's parity drops below a threshold. The 8 tools are still English-only until proper translations land — better an English page that ranks than a Hindi page that gets deduped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;ExactPDF is browser-only across all 70+ tools. The 4 I lean on most myself — and that you can stress-test against the "open DevTools, watch the network" claim:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://exactpdf.com/tools/pdf-merge" rel="noopener noreferrer"&gt;Merge PDFs up to 150 MB total&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://exactpdf.com/tools/pdf-split" rel="noopener noreferrer"&gt;Split a PDF / extract specific pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://exactpdf.com/tools/pdf-to-images" rel="noopener noreferrer"&gt;PDF → JPG / PNG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://exactpdf.com/tools/ocr-pdf" rel="noopener noreferrer"&gt;OCR scanned documents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus a &lt;a href="https://exactpdf.com/docs/api" rel="noopener noreferrer"&gt;headless API&lt;/a&gt; (&lt;code&gt;@exactpdf/mcp&lt;/code&gt; on npm) so AI agents in Cursor / Claude Desktop / Codex can drive the same processing pipeline without the browser shell.&lt;/p&gt;

&lt;p&gt;Free for personal use, 20 free API credits/month for new accounts.&lt;/p&gt;

&lt;p&gt;If you're building anything privacy-first or WASM-heavy and want to swap notes on edge cases, drop a comment — especially curious about how others are handling Tesseract.js memory pressure on mobile.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>privacy</category>
      <category>webassembly</category>
    </item>
  </channel>
</rss>
