<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pranav Mailarpawar</title>
    <description>The latest articles on Forem by Pranav Mailarpawar (@pranav_mailarpawar_7039f2).</description>
    <link>https://forem.com/pranav_mailarpawar_7039f2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3731048%2F90790012-6963-4d3d-adb0-ea88e4dc1035.png</url>
      <title>Forem: Pranav Mailarpawar</title>
      <link>https://forem.com/pranav_mailarpawar_7039f2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pranav_mailarpawar_7039f2"/>
    <language>en</language>
    <item>
      <title>I Tested Every "Free" PDF Tool Online. Only One Is Actually Free.</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Sun, 05 Apr 2026 12:38:28 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/i-tested-every-free-pdf-tool-online-only-one-is-actually-free-56h3</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/i-tested-every-free-pdf-tool-online-only-one-is-actually-free-56h3</guid>
      <description>&lt;p&gt;It is 10 PM. You have a 40-page contract that needs to be merged with an NDA and compressed to under 5MB before emailing to a client in the morning. You search "merge PDF free online," click the first result, upload your documents — and then you see it.&lt;br&gt;
"Your file is ready! Upgrade to Pro to remove the watermark."&lt;br&gt;
The word WATERMARK stamped diagonally across every page of the document you were about to send to a client.&lt;br&gt;
You close the tab, try the next result. Same story. Third one asks you to create an account. Fourth one says you have used your two free tasks for the day. Forty minutes later, you are still looking.&lt;br&gt;
This experience is not an accident. It is a business model. And once you understand it, you will never fall into that trap again.&lt;/p&gt;

&lt;p&gt;The "free" PDF tool trap — what is actually happening&lt;br&gt;
When iLovePDF, Smallpdf, or Adobe offer a "free" tier, they are running what the software industry calls a freemium funnel. The goal is not to give you a useful free tool. The goal is to get you close enough to finishing a task that you feel the friction of the limitation — the watermark, the task cap, the size limit — and convert to a paying customer rather than start over.&lt;br&gt;
Every design decision in the free tier is optimised for that conversion, not for your convenience.&lt;br&gt;
There is one tool that works completely differently. ihatepdf.cv processes every single PDF operation inside your browser tab using WebAssembly — a technology that lets professional-grade libraries like Ghostscript run at near-native speed without a server. Open DevTools while using it. Watch the Network tab. You will see zero upload requests for your PDF file. Nothing leaves your device. No server means no cost to throttle, no mechanism to inject a watermark, and nothing to log in for.&lt;br&gt;
That is the structural difference. Everything else flows from it.&lt;br&gt;
Now, the honest breakdown of every major player.&lt;/p&gt;

&lt;p&gt;ihatepdf.cv — the one built differently&lt;br&gt;
ihatepdf.cv launched with one design constraint above all others: files never leave your device. This is not a privacy marketing claim bolted on afterwards. It is a technical consequence of the architecture.&lt;br&gt;
What it means in practice:&lt;br&gt;
No watermarks, ever. There is no server generating output files, so there is no mechanism to inject a watermark. The output is produced by the WebAssembly library running in your browser — it only does what you ask it to do.&lt;br&gt;
No task limits. Limits exist to throttle server usage and push users toward paid tiers. With no server usage, there is nothing to throttle. Merge 50 PDFs, compress them, split them, edit them, and encrypt them all in one afternoon — no limit hit.&lt;br&gt;
Works offline. Once the page has loaded, the WebAssembly libraries are cached via a service worker. Disconnect your WiFi and every tool still works. This is structurally impossible for server-based tools.&lt;br&gt;
No account required. Accounts exist to track usage against a limit and retain users for conversion. With no limits to track and no server-side data to store, there is nothing to log in for.&lt;br&gt;
The trade-off worth being honest about: very heavy processing on old or low-memory devices takes longer than a dedicated server. Compressing a 100MB PDF on an eight-year-old laptop is slower than the same operation on cloud infrastructure. For very large files on constrained devices, server-based tools have a genuine speed advantage. That is a real difference and worth knowing.&lt;/p&gt;

&lt;p&gt;iLovePDF — the most popular, with real limitations&lt;br&gt;
iLovePDF is the dominant name in browser-based PDF tools. Ask most people to name an online PDF tool and they will name iLovePDF. The interface is fast and polished, and the range of tools is wide. For occasional light use, the free tier is genuinely functional.&lt;br&gt;
But it breaks down in ways that matter.&lt;br&gt;
The free tier caps file size at 200MB per file — which sounds generous until you are handling scanned document archives or high-resolution design PDFs. Files are uploaded and processed on their servers, which means your document travels through infrastructure you do not control. Their privacy policy states deletion after two hours, but your file exists on their hardware during that window. And the free tier adds watermarks on certain conversion outputs — which tools add them and which do not is not clearly communicated upfront. You find out after processing.&lt;br&gt;
iLovePDF Premium runs at approximately $6.61 per month on an annual plan. It removes limits and watermarks. If you process PDFs professionally every day and cloud convenience outweighs privacy concerns, it is a reasonable value. For anyone else, the free tier limitations are a real friction point.&lt;br&gt;
The honest verdict: Best for casual, non-sensitive use where privacy is not a concern. Not suitable for sensitive documents, offline use, or high-volume workflows without paying.&lt;/p&gt;

&lt;p&gt;Smallpdf — the best-designed tool with the worst free tier&lt;br&gt;
Smallpdf has arguably the most polished interface of any online PDF tool. The user experience is smooth, the design is genuinely good, and the brand recognition is massive.&lt;br&gt;
The free tier, however, is the most restrictive of any major tool.&lt;br&gt;
Two tasks per hour. Two tasks per day. This is stated clearly — but the real impact only becomes clear when you hit it mid-workflow. Two tasks means: merge one PDF, compress one PDF. That is your day done on the free tier. Beyond that, output files on the free tier carry a clearly visible Smallpdf watermark placed on document content, not tucked in a margin.&lt;br&gt;
Smallpdf Pro costs $12 per month (monthly billing) or $108 per year. For teams that need a polished, cloud-synced PDF workflow with collaboration features, the Pro pricing is competitive. For individuals who occasionally need to process a PDF without a watermark, paying $12 a month is genuinely hard to justify.&lt;br&gt;
The honest verdict: The 2-tasks-per-day free tier is not a free tier — it is a 24-hour free trial that resets daily. If you will pay for a PDF tool and want cloud-based team collaboration with the best-designed interface in the category, Smallpdf Pro earns its price. For everyone else, there are better options.&lt;/p&gt;

&lt;p&gt;Adobe Acrobat online — the brand name with the weakest free offering&lt;br&gt;
Adobe invented the PDF format. Their Acrobat desktop software is the definitive professional tool used by law firms, publishers, and enterprise organisations worldwide. The desktop application is comprehensive and genuinely powerful.&lt;br&gt;
The online free tier is a different story.&lt;br&gt;
Adobe's free web offering is designed as a lead magnet for Acrobat subscriptions, not as a standalone tool. Core editing, form creation, advanced conversion — anything beyond basic viewing requires a subscription. An Adobe ID is required even for the limited free functions. Acrobat Standard costs $12.99 per month (annual commitment). Acrobat Pro costs $23.99 per month. Monthly pricing is significantly higher.&lt;br&gt;
Where Adobe genuinely excels is the paid Acrobat Pro desktop application. Advanced redaction, certified digital signatures that meet legal standards, PDF/A archiving compliance, accessibility tagging, and professional print production features are all best-in-class. For legal, publishing, and enterprise workflows that genuinely require these capabilities, the subscription is justified.&lt;br&gt;
The honest verdict: Not meaningfully competitive as a free tool — the free tier exists primarily to prompt subscription sign-ups. For professional legal, enterprise, or publishing workflows requiring certified e-signatures and advanced compliance features, Acrobat Pro is the industry standard. For everything else, the price is impossible to justify.&lt;/p&gt;

&lt;p&gt;Sejda — the most honest free tier of the server-based tools&lt;br&gt;
Sejda is less well-known but deserves a mention for one reason: it is the most transparently communicated free tier of the server-based options. Limits are stated upfront rather than revealed after processing.&lt;br&gt;
Three tasks per hour, with a 200-page or 50MB cap per task. More generous than Smallpdf's two-per-day, but still a real constraint for anything beyond occasional use. The key differentiator: Sejda does not watermark free tier output. That separates it from Smallpdf cleanly. No account is required for most tools. Files go to their servers with a stated two-hour deletion window.&lt;br&gt;
Sejda Premium is $7.50 per month on an annual plan — the most affordable premium option among server-based tools.&lt;br&gt;
The honest verdict: The most user-respectful server-based free option. No watermarks, clear limits stated upfront, no account required. For non-sensitive documents where server processing is acceptable and you run into device memory limits on ihatepdf.cv, Sejda is the best fallback.&lt;/p&gt;

&lt;p&gt;The privacy question nobody explains clearly&lt;br&gt;
This is the most consequential difference between the tools, and the least clearly communicated.&lt;br&gt;
When you upload a file to any server-based PDF tool — regardless of what their privacy policy says — the following happens: your file travels over the internet to their data centre, it lands on a server and is written to disk, it is processed by their software, the output is written and made available for download, and at some point within the stated retention window it is deleted, assuming their systems work correctly.&lt;br&gt;
For most documents — a recipe PDF, a product manual, a publicly available report — none of this matters. For contracts, medical records, financial statements, personal ID documents, legal filings, or CVs with personal data, this chain of custody has real implications. GDPR, HIPAA, and most professional codes of conduct for legal and financial work have specific requirements about where and how client data may be processed.&lt;br&gt;
ihatepdf.cv's approach is not "we delete your files promptly." It is "we never receive your files at all." That is a categorically different guarantee, not a stronger version of the same thing.&lt;/p&gt;

&lt;p&gt;The honest comparison in plain language&lt;br&gt;
Here is the actual state of each tool on the dimensions that matter:&lt;br&gt;
ihatepdf.cv — No watermark. No limits. No upload. No sign-up. Works offline. Free.&lt;br&gt;
iLovePDF (free tier) — Watermark on some tools. File size cap at 200MB. Server upload required. Occasional queue times. No offline use. Free, or $6.61/month for Pro.&lt;br&gt;
Smallpdf (free tier) — Watermark on output. 2 tasks per day. Server upload required. Sign-up pushed throughout. No offline use. Free, or $12/month for Pro.&lt;br&gt;
Adobe Acrobat online — Very limited free functionality. Account required. Server upload required. No offline use. $12.99–$23.99/month for meaningful access.&lt;br&gt;
Sejda (free tier) — No watermark. 3 tasks per hour. Server upload required. No account required. No offline use. Free, or $7.50/month for Pro.&lt;/p&gt;

&lt;p&gt;When to actually use each one&lt;br&gt;
Use ihatepdf.cv when privacy matters, you want genuinely unlimited free use, you might be offline, or you need a tool that works without hitting unexpected limits mid-workflow. Also the right choice for sensitive documents — contracts, CVs, medical records — where you would not be comfortable uploading to a third-party server.&lt;br&gt;
Use iLovePDF when you are processing large files on an older device and speed outweighs privacy concerns, or you are already on iLovePDF Premium and the cloud sync is part of your workflow.&lt;br&gt;
Use Smallpdf Pro when you are paying for it, need team collaboration features, and want the best-designed interface in the category. On the free tier: genuinely do not bother.&lt;br&gt;
Use Adobe Acrobat Pro when you work in legal, publishing, or enterprise contexts requiring certified digital signatures, PDF/A compliance, or advanced accessibility features. It is a professional tool at a professional price — that match is either right for your context or it is not.&lt;br&gt;
Use Sejda free when you need a server-based tool for speed on a constrained device and privacy is acceptable. The most user-respectful server-based free option available.&lt;/p&gt;

&lt;p&gt;The bottom line&lt;br&gt;
The free PDF tool space is mostly a collection of freemium funnels dressed up as utilities. Watermarks, task limits, and forced sign-ups are not limitations of the technology — they are deliberate friction designed to drive conversions to paid tiers.&lt;br&gt;
ihatepdf.cv is genuinely different because the architecture that makes it private is the same architecture that makes it free. No server means no cost to throttle, no mechanism to watermark, and nothing to log in for. It is free because there is no server infrastructure to pay for — not because they are running a generous limited-time offer.&lt;br&gt;
If you process PDFs that contain anything you would not casually email to a stranger, the choice is straightforward.&lt;br&gt;
👉 Try it for free at ihatepdf.cv&lt;/p&gt;

&lt;p&gt;Tested all tools in mid-2025. Pricing and free tier terms are accurate as of the time of writing and subject to change.&lt;/p&gt;

&lt;p&gt;Tags: PDF tools, ihatepdf, free PDF editor, iLovePDF, Smallpdf, Adobe Acrobat, online PDF editor, PDF compressor, free PDF tools, privacy, no watermark PDF&lt;/p&gt;

&lt;p&gt;ihatepdf.cv is a free online PDF toolkit — merge, split, compress, edit, convert, and transform PDF files without watermarks, without sign-ups, and without sending your files to a server.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How to Invert PDF Colours Online for Free — The Easiest Tool You Haven't Tried Yet</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Sun, 05 Apr 2026 11:59:59 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/how-to-invert-pdf-colours-online-for-free-the-easiest-tool-you-havent-tried-yet-19d</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/how-to-invert-pdf-colours-online-for-free-the-easiest-tool-you-havent-tried-yet-19d</guid>
      <description>&lt;p&gt;If you've ever opened a PDF late at night and been blasted by a blinding white background, you already know the problem. Or maybe you're printing a document and want to save toner by flipping colours. Whatever the reason, inverting PDF colours sounds like it should be simple — but most tools make it surprisingly complicated.&lt;br&gt;
That's exactly why ihatepdf.cv built its Invert PDF Colours tool. It does one thing, it does it well, and it costs you nothing.&lt;/p&gt;

&lt;p&gt;What Does "Invert PDF Colours" Actually Mean?&lt;br&gt;
When you invert a PDF's colours, every colour in the document flips to its opposite on the colour wheel. White becomes black. Black becomes white. Dark navy turns bright yellow. It's essentially the same as toggling "Dark Mode" — but for any PDF file, regardless of how it was created.&lt;br&gt;
This is especially useful for:&lt;/p&gt;

&lt;p&gt;Night reading — A white-background PDF can feel like staring into a flashlight at 11 PM. Inverted colours are dramatically easier on the eyes in low-light environments.&lt;br&gt;
Accessibility — For people with light sensitivity, migraines, or certain visual impairments, dark-background documents significantly reduce eye strain.&lt;br&gt;
Printing efficiency — Inverting a heavily white document to a dark one (or vice versa) can save a surprising amount of ink and toner, especially for presentations or design-heavy files.&lt;br&gt;
Design and presentations — Sometimes you need a dark-themed version of a document for a specific brand context or slide deck style.&lt;/p&gt;

&lt;p&gt;Why Most PDF Tools Make This Harder Than It Needs to Be&lt;br&gt;
Before ihatepdf.cv, inverting PDF colours usually meant:&lt;/p&gt;

&lt;p&gt;Downloading bulky desktop software&lt;br&gt;
Paying for a premium subscription&lt;br&gt;
Jumping through export and re-import hoops in Adobe Acrobat&lt;br&gt;
Receiving a watermarked output because you used a "free" tier&lt;/p&gt;

&lt;p&gt;That's an absurd amount of friction for what should be a two-second job.&lt;/p&gt;

&lt;p&gt;How ihatepdf.cv's Invert PDF Colours Tool Works&lt;br&gt;
Here's the entire process — and yes, it really is this short:&lt;br&gt;
Step 1 — Go to ihatepdf.cv&lt;br&gt;
Open your browser and head to ihatepdf.cv. No account needed, no app to download.&lt;br&gt;
Step 2 — Select the Invert PDF Colours Tool&lt;br&gt;
From the homepage, find the Invert PDF Colours option. It's right there in the tools menu — no hunting through settings or dropdowns.&lt;br&gt;
Step 3 — Upload Your PDF&lt;br&gt;
Drag and drop your file onto the upload area, or click to browse and select it from your device. The tool accepts standard PDF files of all sizes.&lt;br&gt;
Step 4 — Click Invert&lt;br&gt;
Hit the button. The tool processes your document instantly — no waiting, no progress bar spinning for five minutes.&lt;br&gt;
Step 5 — Download Your Inverted PDF&lt;br&gt;
Your colour-inverted PDF is ready. Download it directly. No watermark. No email required. No forced sign-up.&lt;br&gt;
That's it. Five steps, under thirty seconds.&lt;/p&gt;

&lt;p&gt;What Makes ihatepdf.cv Different from Other Free PDF Tools?&lt;br&gt;
There are dozens of online PDF tools, so why should you choose ihatepdf.cv? The GSC data tells the story clearly — thousands of users search for it every month, and a significant portion of them specifically come back for the .cv domain, meaning they remember the brand. Here's why:&lt;br&gt;
✅ Completely Free, No Watermarks&lt;br&gt;
The number one complaint with free PDF tools is the watermark problem. You process a clean professional document, download it, and find a giant "Processed by XYZ" stamp across every page. ihatepdf.cv doesn't do that.&lt;br&gt;
✅ No Sign-Up Required&lt;br&gt;
You don't need to create an account, hand over your email address, or start a free trial that auto-charges you in 14 days. Open the site, use the tool, leave. Simple.&lt;br&gt;
✅ Works on Any Device&lt;br&gt;
Whether you're on a Windows desktop, a MacBook, an Android phone, or an iPad — ihatepdf.cv is browser-based and fully responsive. No compatibility issues.&lt;br&gt;
✅ Fast and Lightweight&lt;br&gt;
The tool is built to process PDFs quickly. There's no server queue or slow upload bottleneck. You get your file back fast.&lt;br&gt;
✅ Privacy-Conscious&lt;br&gt;
Your uploaded files are processed and then deleted. You're not contributing your sensitive documents to a permanent database somewhere.&lt;/p&gt;

&lt;p&gt;Real-World Use Cases for Inverting PDF Colours&lt;br&gt;
Students Studying at Night&lt;br&gt;
If you're reading lecture notes or research papers after dark, a white PDF on full brightness is brutal. Inverting the colours creates a comfortable dark-on-light reading experience that won't tire your eyes out before your exam tomorrow.&lt;br&gt;
Remote Workers and Freelancers&lt;br&gt;
Working across different time zones often means late-night document reviews. An inverted PDF is much more comfortable during those 2 AM catch-up sessions.&lt;br&gt;
People with Visual Sensitivities&lt;br&gt;
Many people experience headaches or eye discomfort with bright white screens. Inverted PDFs can serve as a quick accessibility workaround when an application doesn't offer a native dark mode.&lt;br&gt;
Architects, Designers, and Engineers&lt;br&gt;
Technical drawings and CAD exports are often presented as dark lines on a white background. Inverting them creates a dramatic dark-background look that can be striking in presentations.&lt;br&gt;
Teachers and Educators&lt;br&gt;
Creating handouts or worksheets? An inverted colour scheme can add visual variety to printed materials and make certain documents stand out in a packet.&lt;/p&gt;

&lt;p&gt;ihatepdf.cv — More Than Just One Tool&lt;br&gt;
While the Invert PDF Colours tool is excellent, it's just one part of what ihatepdf.cv offers. The platform is built around the idea that every common PDF task should be fast, free, and frustration-free. Other popular tools on the site include:&lt;/p&gt;

&lt;p&gt;Merge PDF — Combine multiple PDFs into a single document&lt;br&gt;
Split PDF — Extract specific pages from a large file&lt;br&gt;
Compress PDF — Reduce file size without losing quality&lt;br&gt;
Edit PDF — Make changes directly to text and images&lt;br&gt;
Convert PDF — Transform PDFs to Word, Excel, JPG, and more&lt;br&gt;
Rotate PDF — Fix the orientation of scanned or incorrectly saved pages&lt;/p&gt;

&lt;p&gt;All of these follow the same zero-friction philosophy: no watermarks, no sign-up, no nonsense.&lt;/p&gt;

&lt;p&gt;How to Get the Best Results When Inverting PDF Colours&lt;br&gt;
A few quick tips to make the most of the tool:&lt;br&gt;
Check your document type first. Text-heavy PDFs (like research papers or contracts) invert beautifully. Colour photographs embedded in PDFs will also invert, which can look unusual — keep that in mind if your document has full-colour imagery.&lt;br&gt;
Use it for reading, not final publishing. For personal reading and study use, the inverted file is perfect. For professionally shared documents, you may want to stick with the original colour scheme unless the dark version is intentional.&lt;br&gt;
Combine with Compress PDF if needed. After inverting, if your file size has increased, run it through the Compress PDF tool on ihatepdf.cv to bring it back down.&lt;/p&gt;

&lt;p&gt;The Bottom Line&lt;br&gt;
If you need to invert the colours of a PDF — for eye comfort, accessibility, printing, or design — ihatepdf.cv is the fastest, cleanest way to do it online. No watermarks. No account. No cost. Just upload, invert, and download.&lt;br&gt;
It's the kind of tool that makes you wonder why you ever struggled with anything else.&lt;br&gt;
👉 Try it now at ihatepdf.cv&lt;/p&gt;

&lt;p&gt;Found this useful? Share it with someone who's still squinting at a blinding white PDF at midnight.&lt;/p&gt;

&lt;p&gt;Tags: PDF tools, Invert PDF, Dark mode PDF, Free PDF editor, ihatepdf.cv, Online PDF tools, PDF colour invert, Edit PDF online free, PDF accessibility&lt;/p&gt;

&lt;p&gt;ihatepdf.cv is a free online PDF toolkit that helps users merge, split, compress, edit, convert, and transform PDF files — without watermarks, without sign-ups, and without the usual headaches.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How to Compress a PDF for Free — Up to 70% Smaller, No Upload, No Watermark</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Sat, 04 Apr 2026 18:25:32 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/how-to-compress-a-pdf-for-free-up-to-70-smaller-no-upload-no-watermark-5h5b</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/how-to-compress-a-pdf-for-free-up-to-70-smaller-no-upload-no-watermark-5h5b</guid>
      <description>&lt;p&gt;If you've ever hit "Send" on an email only to get a bounce-back saying your attachment is too large — you already know the pain. Gmail blocks files over 25MB. Outlook caps at 20MB. Government portals, HR systems, and university submission forms often set limits as low as 2MB.&lt;br&gt;
And yet, somehow, your PDF is 47MB.&lt;br&gt;
The good news: you can compress a PDF down by up to 70% for free, right in your browser, in under 30 seconds — no account, no watermark, and critically, no uploading your file to someone else's server.&lt;br&gt;
Here's everything you need to know.&lt;/p&gt;

&lt;p&gt;Why Most Online PDF Compressors Are a Privacy Risk (And What to Use Instead)&lt;br&gt;
Most popular PDF tools — ilovepdf, smallpdf, Adobe Acrobat Online — work by uploading your file to their servers, compressing it there, and sending it back. That means your CV, your contract, your medical records, or your client documents sit on a third-party server you have no control over.&lt;br&gt;
ihatepdf.cv/compress-pdf takes a different approach entirely.&lt;br&gt;
It uses Ghostscript compiled to WebAssembly, which means the entire compression pipeline runs inside your browser tab. Your file bytes never travel anywhere. Not to ihatepdf.cv's servers. Not anywhere. Once the page loads, it even works fully offline.&lt;br&gt;
This is the only truly private way to compress a PDF online.&lt;/p&gt;

&lt;p&gt;How to Compress a PDF Free — Step by Step&lt;/p&gt;

&lt;p&gt;Go to ihatepdf.cv/compress-pdf — no sign-up, no email required&lt;br&gt;
Click "Select PDF to Compress" or drag and drop your file onto the page&lt;br&gt;
Choose your compression level (more on this below)&lt;br&gt;
Click "Compress PDF Now" and wait 5–30 seconds&lt;br&gt;
Download your smaller PDF — no watermark added&lt;/p&gt;

&lt;p&gt;That's it. Works on Windows, Mac, iPhone, and Android — any modern browser, any device.&lt;/p&gt;

&lt;p&gt;Which Compression Level Should You Choose?&lt;br&gt;
This is the question most guides skip, and it's the most important one.&lt;br&gt;
🟢 Light — 20–30% reduction&lt;br&gt;
Images are resampled to 300 DPI. Best for documents where print quality matters: design portfolios, high-resolution presentations, professional photography. The output is visually indistinguishable from the original.&lt;br&gt;
🟡 Medium — 40–50% reduction&lt;br&gt;
Images at 150 DPI. This is the sweet spot for 99% of use cases: CVs, business reports, contracts, slide decks, university submissions. Barely any visible quality difference on screen.&lt;br&gt;
🔴 Heavy — 60–70% reduction&lt;br&gt;
Images at 72 DPI. Maximum compression for archiving, internal records, or any situation where file size is the only priority. Text remains perfectly sharp at every level — only raster images are affected.&lt;br&gt;
Quick rule of thumb: Start with Medium. If the file is still too large, switch to Heavy.&lt;/p&gt;

&lt;p&gt;Will Compressing a PDF Make It Blurry?&lt;br&gt;
Short answer: text will never be affected — not even slightly.&lt;br&gt;
Text in PDFs is stored as mathematical vector paths, not pixels. Compression doesn't touch it. What changes is embedded raster images: photos, scanned pages, illustrations. At Light and Medium, this difference is invisible on screen. At Heavy, photos may look slightly softer — but every word stays razor-sharp.&lt;br&gt;
If your PDF is mostly text (a CV, a report, a contract), even Heavy compression will look identical to the original on screen.&lt;/p&gt;

&lt;p&gt;How Much Can Your PDF Actually Compress?&lt;br&gt;
Results vary significantly by document type:&lt;br&gt;
Document TypeExpected ReductionScanned documents50–70%Presentations with stock photos40–60%CVs and text-heavy reports10–30%PDFs exported from Word/Excel15–40%&lt;br&gt;
A 15MB scanned report typically compresses to 2–4MB on Heavy. A photo-heavy slide deck at 30MB often drops below 12MB on Medium.&lt;/p&gt;

&lt;p&gt;Compressing a PDF for Email: The Fastest Approach&lt;br&gt;
Gmail: 25MB limit&lt;br&gt;
Outlook: 20MB limit&lt;br&gt;
Most HR/university portals: 2–5MB limit&lt;br&gt;
Here's the decision tree:&lt;/p&gt;

&lt;p&gt;Try Medium compression first — most PDFs under 50MB will clear the 25MB limit&lt;br&gt;
Still too large? Switch to Heavy compression&lt;br&gt;
Still too large after Heavy? Use ihatepdf.cv/split-pdf to divide the document into sections, send them separately, then merge them back afterwards if needed&lt;/p&gt;

&lt;p&gt;How to Compress a PDF to Under 1MB&lt;br&gt;
A 1MB target is achievable for most PDFs with Heavy compression. For image-heavy files that are still over 1MB after compression:&lt;/p&gt;

&lt;p&gt;Apply Heavy compression at ihatepdf.cv/compress-pdf&lt;br&gt;
Split the result into smaller sections using ihatepdf.cv/split-pdf&lt;br&gt;
Compress each section individually&lt;br&gt;
Submit sections separately, or merge them back using ihatepdf.cv/merge-pdf&lt;/p&gt;

&lt;p&gt;One More Trick: Flatten Before You Compress&lt;br&gt;
PDFs with interactive form fields, annotations, and embedded JavaScript carry significant invisible overhead. If your PDF was originally a fillable form or has tracked changes and comments, flatten it first.&lt;br&gt;
Go to ihatepdf.cv/flatten-pdf → flatten → then compress.&lt;br&gt;
This two-step process often produces smaller files than compression alone, especially for PDFs exported from form-building tools.&lt;/p&gt;

&lt;p&gt;What About Password-Protected PDFs?&lt;br&gt;
Password-protected PDFs can't be compressed directly. You'll need to:&lt;/p&gt;

&lt;p&gt;Decrypt the file at ihatepdf.cv/remove-password&lt;br&gt;
Compress at ihatepdf.cv/compress-pdf&lt;br&gt;
Re-encrypt if needed at ihatepdf.cv/encrypt-pdf&lt;/p&gt;

&lt;p&gt;FAQ&lt;br&gt;
Is there a file size limit?&lt;br&gt;
No artificial limit. The practical ceiling is your device's available RAM — typically 100–150MB on a desktop browser. Close other tabs if you're working with very large files.&lt;br&gt;
Can I compress multiple PDFs at once?&lt;br&gt;
The tool processes one file at a time. For a batch, use ihatepdf.cv/merge-pdf to combine them first, then compress the merged file in a single pass.&lt;br&gt;
Why is my compressed file still large?&lt;br&gt;
If Heavy compression only shrank your file by 10–15%, the PDF is mostly text and vector graphics — not images. There's limited room to compress further. Try splitting the document instead.&lt;br&gt;
Does compression affect links, form fields, or bookmarks?&lt;br&gt;
No. Hyperlinks, form fields, bookmarks, and annotations are all preserved after compression.&lt;/p&gt;

&lt;p&gt;The Bottom Line&lt;br&gt;
The fastest, most private way to compress a PDF:&lt;br&gt;
ihatepdf.cv/compress-pdf&lt;br&gt;
No upload. No watermark. No account. Works on every device. Runs entirely in your browser.&lt;br&gt;
Pick Medium compression, click compress, and your file will be ready in under 30 seconds.&lt;/p&gt;

&lt;p&gt;Have a PDF that's fighting back? Drop a question in the comments — happy to help figure out the best approach for your specific file.&lt;/p&gt;

&lt;p&gt;Also on ihatepdf.cv:&lt;/p&gt;

&lt;p&gt;Merge PDFs — Combine files before or after compressing&lt;br&gt;
Split PDF — Divide large files to hit upload limits&lt;br&gt;
Flatten PDF — Lock form fields to reduce size further&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How to Merge PDF Files Free Online — No Watermark, No Sign-Up, Works Offline</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Thu, 02 Apr 2026 17:21:37 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/how-to-merge-pdf-files-free-online-no-watermark-no-sign-up-works-offline-kem</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/how-to-merge-pdf-files-free-online-no-watermark-no-sign-up-works-offline-kem</guid>
      <description>&lt;p&gt;You've got three PDFs. A contract, an attachment, and a cover letter. You need them as one file before the deadline. So you search for a PDF merger, land on something that looks free, upload your files — and then it tells you the download requires an account, or worse, the output has a watermark stamped across every page.&lt;br&gt;
Sound familiar?&lt;br&gt;
Here's how to actually merge PDFs for free, with no watermark, no sign-up, and no files leaving your device — using ihatepdf.cv.&lt;/p&gt;

&lt;p&gt;What You Actually Need to Merge PDFs&lt;br&gt;
Just a browser. That's it.&lt;br&gt;
ihatepdf.cv's PDF merger runs entirely in your browser — no software installation, no account, no watermark on the output. Your files never leave your device. It also works offline once the page has loaded.&lt;/p&gt;

&lt;p&gt;How to Merge PDFs in Under 2 Minutes&lt;/p&gt;

&lt;p&gt;Go to ihatepdf.cv/merge-pdf — no sign-up required&lt;br&gt;
Upload your PDF files — click Select PDFs or drag and drop. Add as many files as you need, no limit&lt;br&gt;
Drag to reorder — drag the thumbnails into the final page sequence you want&lt;br&gt;
Click Merge PDFs — processing takes a few seconds, locally in your browser&lt;br&gt;
Download — the combined PDF downloads instantly with no watermark&lt;/p&gt;

&lt;p&gt;The whole thing takes about 90 seconds for most people.&lt;/p&gt;

&lt;p&gt;How to Merge PDFs on Mac&lt;br&gt;
On Mac you can also use Preview: open the first PDF, go to View → Thumbnails, then drag other PDFs into the sidebar. This works for basic merging but requires all files already on your Mac and you must save manually.&lt;br&gt;
For a faster solution that works on any OS without installing anything, ihatepdf.cv's merger is the quickest option — and it doesn't require any application to be open.&lt;/p&gt;

&lt;p&gt;How to Merge PDFs on Windows&lt;br&gt;
Windows has no built-in PDF merger. The fastest approach is a browser-based tool — open ihatepdf.cv/merge-pdf in Chrome, Edge, or Firefox, upload your files, and download the merged result. No software installation needed, no account required.&lt;/p&gt;

&lt;p&gt;How to Merge PDFs on iPhone or Android&lt;br&gt;
ihatepdf.cv works perfectly on mobile browsers. Open the tool in Safari on iPhone or Chrome on Android, tap to upload from your Files app or Google Drive, reorder the page thumbnails, and tap Download. The merged file saves directly to your phone's Downloads folder. No app installation needed.&lt;/p&gt;

&lt;p&gt;What to Do If the Merged PDF Is Too Large to Email&lt;br&gt;
After merging, if the file exceeds your email provider's attachment limit — Gmail caps at 25MB, Outlook at 20MB — run the merged file through ihatepdf.cv's PDF compressor. Medium compression typically reduces a merged PDF by 40–50% with no visible quality loss on screen.&lt;br&gt;
If you still need it smaller, try splitting the merged PDF into logical sections and sending them separately.&lt;/p&gt;

&lt;p&gt;Reorder Pages Before Merging for Perfect Results&lt;br&gt;
If individual PDFs have internal page issues — wrong page order, sideways scans, or blank pages to remove — use Organize Pages on each file first to fix the internal order, then merge them all together. This two-step approach gives you full control over the final page sequence.&lt;/p&gt;

&lt;p&gt;Why "No Upload" Matters More Than You Think&lt;br&gt;
Most PDF merging services upload your documents to their servers before combining them. For contracts, CVs, financial records, or any confidential document, this is an unnecessary privacy risk.&lt;br&gt;
ihatepdf.cv uses pdf-lib compiled to WebAssembly — the entire merge operation runs inside your browser tab. The bytes of your documents never travel over the network.&lt;br&gt;
Want to verify it? Open DevTools → Network tab → upload your files and hit Merge. Upload column: zero.&lt;/p&gt;

&lt;p&gt;Frequently Asked Questions&lt;br&gt;
Is there a limit on how many PDFs I can merge?&lt;br&gt;
No limit. Merge as many files as your device memory allows — typically 20–50 files on a desktop browser.&lt;br&gt;
Will the merged PDF have a watermark?&lt;br&gt;
Never. ihatepdf.cv does not add watermarks to any output file under any circumstances.&lt;br&gt;
Does merging PDFs reduce quality?&lt;br&gt;
No. The merge operation is structural only — it preserves the original quality of every page exactly as-is. No re-encoding occurs.&lt;br&gt;
Can I merge password-protected PDFs?&lt;br&gt;
You need to remove the password first, then merge, then re-encrypt with Encrypt PDF if needed.&lt;br&gt;
Can I merge PDFs with different page sizes?&lt;br&gt;
Yes. PDFs with different page dimensions — A4 mixed with US Letter, portrait mixed with landscape — can be merged. Each page retains its original dimensions in the output. Use Crop &amp;amp; Resize first if you need all pages to be the same size.&lt;/p&gt;

&lt;p&gt;Merging PDFs shouldn't require an account, a subscription, or a watermark on your work. ihatepdf.cv just does it — free, private, no nonsense.&lt;br&gt;
Merge your PDFs now →&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Scan Any Document to a Searchable PDF For Free, Right in Your Browser</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Wed, 01 Apr 2026 19:08:53 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/scan-any-document-to-a-searchable-pdf-for-free-right-in-your-browser-4nna</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/scan-any-document-to-a-searchable-pdf-for-free-right-in-your-browser-4nna</guid>
      <description>&lt;p&gt;If you've ever had a printed contract, a handwritten note, or a physical receipt that you desperately needed as a PDF — and found yourself wrestling with apps that either upload your files to some random server, slap a watermark on the result, or charge you after three free scans — you know the frustration.&lt;br&gt;
There's a better way. It's called ihatepdf.cv, and it runs entirely in your browser.&lt;/p&gt;

&lt;p&gt;What ihatepdf.cv Actually Does&lt;br&gt;
ihatepdf.cv is a free browser-based document scanner. You open it, point your phone or laptop camera at a piece of paper, and it spits out a clean, flat, searchable PDF — with no account required, no file ever leaving your device, and absolutely no watermark.&lt;br&gt;
It's the kind of tool that should have always existed. Here's what happens under the hood when you hit that shutter button.&lt;/p&gt;

&lt;p&gt;Step 1: It Detects the Document Edges Automatically&lt;br&gt;
The moment you capture a frame, ihatepdf.cv runs an edge-detection algorithm directly in JavaScript — no computer vision library, no WebAssembly blob to download. It applies a Gaussian blur to reduce noise, then runs a Sobel operator across the image to find where brightness changes sharply (i.e., where your document meets the desk or background).&lt;br&gt;
The four corners of your document are estimated from those edge points and shown as draggable handles on screen. Most of the time the auto-detect is spot on. When it isn't, you just drag the corners yourself — and a little magnifying loupe pops up under your finger so you can place them with pixel-level precision.&lt;/p&gt;

&lt;p&gt;Step 2: Perspective Correction (The Magic Part)&lt;br&gt;
This is what separates a real document scanner from just "taking a photo." Even if your paper is at an angle — tilted 20 degrees, shot from the side — ihatepdf.cv mathematically flattens it.&lt;br&gt;
It computes a homography matrix: a transformation that maps your four selected corners to a perfect rectangle. Then it walks every pixel of the output image backwards through that transformation to find where it came from in the original photo. The result is a flat, de-skewed document that looks like it came off a flatbed scanner.&lt;br&gt;
No OpenCV. No server. Eight equations, eight unknowns, solved in JavaScript.&lt;/p&gt;

&lt;p&gt;Step 3: Enhancement — Make It Actually Readable&lt;br&gt;
Raw camera images are often too soft or too grey for OCR to work well. ihatepdf.cv gives you five one-tap presets:&lt;/p&gt;

&lt;p&gt;Original — untouched, exactly what the camera saw&lt;br&gt;
Document — boosted contrast and brightness, great for printed text&lt;br&gt;
B&amp;amp;W — full greyscale with heavy contrast, sharp black text on white&lt;br&gt;
Whiteboard — aggressive brightness lift for whiteboard photos&lt;br&gt;
Photo — gentle enhancement for mixed-content pages&lt;/p&gt;

&lt;p&gt;You can also fine-tune brightness and contrast manually with sliders. Any preset can be applied to all pages in one tap — useful when you're scanning a multi-page document in the same lighting conditions.&lt;/p&gt;

&lt;p&gt;Step 4: OCR — Turn Your Scan Into a Searchable PDF&lt;br&gt;
This is the feature that makes the output genuinely useful rather than just a pretty image. After you've captured and adjusted your pages, hitting "Scan &amp;amp; Extract Text" fires up Tesseract.js — an open-source OCR engine that runs locally in your browser.&lt;br&gt;
It supports 17 languages including English, Hindi, French, German, Arabic, Chinese, Japanese, Korean, and more. Set it to Auto and it detects the script automatically from your first page.&lt;br&gt;
Before OCR runs, ihatepdf.cv preprocesses each image:&lt;/p&gt;

&lt;p&gt;Normalises the contrast range so faint text becomes visible&lt;br&gt;
Sharpens edges with a Laplacian filter&lt;br&gt;
Upscales small images to the sweet spot for Tesseract accuracy (~1800px)&lt;br&gt;
If confidence is low, automatically retries on a binarised (black and white) version&lt;/p&gt;

&lt;p&gt;The extracted words, along with their exact positions on the page, are embedded as an invisible text layer in the final PDF — white text sitting just underneath the image. The document looks exactly like your scan, but you can select text, search with Ctrl+F, and have it indexed by any search engine or document manager.&lt;/p&gt;

&lt;p&gt;Step 5: Export Options That Don't Hold You Hostage&lt;br&gt;
Once processed, you get:&lt;/p&gt;

&lt;p&gt;Download as PDF — a proper searchable PDF, no watermark, no size limit other than your device's memory&lt;br&gt;
Download as .txt — the raw extracted text, useful for pasting into other documents&lt;br&gt;
Copy text — one tap to clipboard&lt;br&gt;
Share — on mobile, triggers the native share sheet so you can AirDrop, send via WhatsApp, email it, whatever&lt;/p&gt;

&lt;p&gt;There's also a merge feature: if you already have an existing PDF, you can load it and choose whether your new scan goes before or after it. Useful when you're adding pages to a report or combining a handwritten note with a typed document.&lt;/p&gt;

&lt;p&gt;Multi-Page and Batch Scanning&lt;br&gt;
Single-page scanning is the easy case. ihatepdf.cv handles multi-page documents too.&lt;br&gt;
After each capture, you land back in a review screen where your pages appear as a scrollable thumbnail strip. Tap any thumbnail to preview it full-screen. Drag thumbnails to reorder pages. Long-press for a context menu that lets you rotate, replace, move, or delete individual pages. Swipe left and right on the preview to navigate between pages.&lt;br&gt;
For situations where you need to quickly capture a stack of documents, Batch mode lets you keep the camera open and shoot as many frames as you want. You then review all captures at once, edit corners on any that need it, and add everything to your document in a single step.&lt;br&gt;
A small coloured dot on each thumbnail gives you a blur quality score — green means sharp, amber is borderline, red means you should retake it. It's a small thing, but it saves you from building a 20-page PDF only to find page 7 is unreadable.&lt;/p&gt;

&lt;p&gt;Why "No Upload" Actually Matters&lt;br&gt;
Most online PDF tools — even the free ones — route your files through a server. That's fine for a grocery receipt. It's less fine for:&lt;/p&gt;

&lt;p&gt;Medical records&lt;br&gt;
Legal contracts&lt;br&gt;
Financial statements&lt;br&gt;
ID documents&lt;br&gt;
Confidential work files&lt;/p&gt;

&lt;p&gt;ihatepdf.cv processes everything on your device. The image never leaves. There's no server log of what you scanned, no storage bucket somewhere holding your documents, no terms of service clause about training AI on your uploads.&lt;br&gt;
This isn't a privacy marketing line. It's just what happens when you do image processing in JavaScript on a canvas element. The data literally cannot go anywhere because there's nowhere to send it.&lt;/p&gt;

&lt;p&gt;Works Without an Internet Connection&lt;br&gt;
Once the page loads, ihatepdf.cv works offline. The OCR language models are fetched on first use and cached by the browser. After that, you could put your phone in airplane mode and scan a full document to a searchable PDF without any network activity.&lt;br&gt;
This matters more than it sounds. You're scanning in a basement. On a plane. In a client meeting where pulling out your phone to use some cloud service looks unprofessional. ihatepdf.cv doesn't care.&lt;/p&gt;

&lt;p&gt;Import From Your Gallery Too&lt;br&gt;
Don't have something in front of you to scan right now? You can import existing photos from your camera roll. ihatepdf.cv runs the same edge detection and perspective correction on imported images, automatically applies the Document filter, and adds them to your page stack just like a live capture.&lt;br&gt;
This is useful for: photos you already took of a whiteboard, photos of receipts sent to you by someone else, or images of documents you photographed earlier without a scanning app.&lt;/p&gt;

&lt;p&gt;The Short Version&lt;br&gt;
If you need to turn a physical document into a PDF and you want it to be:&lt;/p&gt;

&lt;p&gt;Free&lt;br&gt;
Searchable&lt;br&gt;
Watermark-free&lt;br&gt;
Private (nothing uploaded)&lt;br&gt;
Available offline&lt;br&gt;
Multi-language&lt;br&gt;
Multi-page&lt;/p&gt;

&lt;p&gt;— ihatepdf.cv does all of that. In your browser. Right now.&lt;br&gt;
No sign-up. No install. Try ihatepdf.cv now →&lt;/p&gt;

</description>
      <category>privacy</category>
      <category>productivity</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Scan Any Document to a Searchable PDF — For Free, Right in Your Browser</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Wed, 01 Apr 2026 19:05:23 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/scan-any-document-to-a-searchable-pdf-for-free-right-in-your-browser-2ea3</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/scan-any-document-to-a-searchable-pdf-for-free-right-in-your-browser-2ea3</guid>
      <description>&lt;p&gt;If you've ever had a printed contract, a handwritten note, or a physical receipt that you desperately needed as a PDF — and found yourself wrestling with apps that either upload your files to some random server, slap a watermark on the result, or charge you after three free scans — you know the frustration.&lt;br&gt;
There's a better way. And it runs entirely in your browser.&lt;/p&gt;

&lt;p&gt;What DocScan Actually Does&lt;br&gt;
DocScan is a browser-based document scanner. You open it, point your phone or laptop camera at a piece of paper, and it spits out a clean, flat, searchable PDF — with no account required, no file ever leaving your device, and absolutely no watermark.&lt;br&gt;
It's the kind of tool that should have always existed. Here's what happens under the hood when you hit that shutter button.&lt;/p&gt;

&lt;p&gt;Step 1: It Detects the Document Edges Automatically&lt;br&gt;
The moment you capture a frame, DocScan runs an edge-detection algorithm directly in JavaScript — no computer vision library, no WebAssembly blob to download. It applies a Gaussian blur to reduce noise, then runs a Sobel operator across the image to find where brightness changes sharply (i.e., where your document meets the desk or background).&lt;br&gt;
The four corners of your document are estimated from those edge points and shown as draggable handles on screen. Most of the time the auto-detect is spot on. When it isn't, you just drag the corners yourself — and a little magnifying loupe pops up under your finger so you can place them with pixel-level precision.&lt;/p&gt;

&lt;p&gt;Step 2: Perspective Correction (The Magic Part)&lt;br&gt;
This is what separates a real document scanner from just "taking a photo." Even if your paper is at an angle — tilted 20 degrees, shot from the side — DocScan mathematically flattens it.&lt;br&gt;
It computes a homography matrix: a transformation that maps your four selected corners to a perfect rectangle. Then it walks every pixel of the output image backwards through that transformation to find where it came from in the original photo. The result is a flat, de-skewed document that looks like it came off a flatbed scanner.&lt;br&gt;
No OpenCV. No server. Eight equations, eight unknowns, solved in JavaScript.&lt;/p&gt;

&lt;p&gt;Step 3: Enhancement — Make It Actually Readable&lt;br&gt;
Raw camera images are often too soft or too grey for OCR to work well. DocScan gives you five one-tap presets:&lt;/p&gt;

&lt;p&gt;Original — untouched, exactly what the camera saw&lt;br&gt;
Document — boosted contrast and brightness, great for printed text&lt;br&gt;
B&amp;amp;W — full greyscale with heavy contrast, sharp black text on white&lt;br&gt;
Whiteboard — aggressive brightness lift for whiteboard photos&lt;br&gt;
Photo — gentle enhancement for mixed-content pages&lt;/p&gt;

&lt;p&gt;You can also fine-tune brightness and contrast manually with sliders. Any preset can be applied to all pages in one tap — useful when you're scanning a multi-page document in the same lighting conditions.&lt;/p&gt;

&lt;p&gt;Step 4: OCR — Turn Your Scan Into a Searchable PDF&lt;br&gt;
This is the feature that makes the output genuinely useful rather than just a pretty image. After you've captured and adjusted your pages, hitting "Scan &amp;amp; Extract Text" fires up Tesseract.js — an open-source OCR engine that runs locally in your browser.&lt;br&gt;
It supports 17 languages including English, Hindi, French, German, Arabic, Chinese, Japanese, Korean, and more. Set it to Auto and it detects the script automatically from your first page.&lt;br&gt;
Before OCR runs, DocScan preprocesses each image:&lt;/p&gt;

&lt;p&gt;Normalises the contrast range so faint text becomes visible&lt;br&gt;
Sharpens edges with a Laplacian filter&lt;br&gt;
Upscales small images to the sweet spot for Tesseract accuracy (~1800px)&lt;br&gt;
If confidence is low, automatically retries on a binarised (black and white) version&lt;/p&gt;

&lt;p&gt;The extracted words, along with their exact positions on the page, are embedded as an invisible text layer in the final PDF — white text sitting just underneath the image. The document looks exactly like your scan, but you can select text, search with Ctrl+F, and have it indexed by any search engine or document manager.&lt;/p&gt;

&lt;p&gt;Step 5: Export Options That Don't Hold You Hostage&lt;br&gt;
Once processed, you get:&lt;/p&gt;

&lt;p&gt;Download as PDF — a proper searchable PDF, no watermark, no size limit other than your device's memory&lt;br&gt;
Download as .txt — the raw extracted text, useful for pasting into other documents&lt;br&gt;
Copy text — one tap to clipboard&lt;br&gt;
Share — on mobile, triggers the native share sheet so you can AirDrop, send via WhatsApp, email it, whatever&lt;/p&gt;

&lt;p&gt;There's also a merge feature: if you already have an existing PDF, you can load it and choose whether your new scan goes before or after it. Useful when you're adding pages to a report or combining a handwritten note with a typed document.&lt;/p&gt;

&lt;p&gt;Multi-Page and Batch Scanning&lt;br&gt;
Single-page scanning is the easy case. DocScan handles multi-page documents too.&lt;br&gt;
After each capture, you land back in a review screen where your pages appear as a scrollable thumbnail strip. Tap any thumbnail to preview it full-screen. Drag thumbnails to reorder pages. Long-press for a context menu that lets you rotate, replace, move, or delete individual pages. Swipe left and right on the preview to navigate between pages.&lt;br&gt;
For situations where you need to quickly capture a stack of documents, Batch mode lets you keep the camera open and shoot as many frames as you want. You then review all captures at once, edit corners on any that need it, and add everything to your document in a single step.&lt;br&gt;
A small coloured dot on each thumbnail gives you a blur quality score — green means sharp, amber is borderline, red means you should retake it. It's a small thing, but it saves you from building a 20-page PDF only to find page 7 is unreadable.&lt;/p&gt;

&lt;p&gt;Why "No Upload" Actually Matters&lt;br&gt;
Most online PDF tools — even the free ones — route your files through a server. That's fine for a grocery receipt. It's less fine for:&lt;/p&gt;

&lt;p&gt;Medical records&lt;br&gt;
Legal contracts&lt;br&gt;
Financial statements&lt;br&gt;
ID documents&lt;br&gt;
Confidential work files&lt;/p&gt;

&lt;p&gt;DocScan processes everything on your device. The image never leaves. There's no server log of what you scanned, no storage bucket somewhere holding your documents, no terms of service clause about training AI on your uploads.&lt;br&gt;
This isn't a privacy marketing line. It's just what happens when you do image processing in JavaScript on a canvas element. The data literally cannot go anywhere because there's nowhere to send it.&lt;/p&gt;

&lt;p&gt;Works Without an Internet Connection&lt;br&gt;
Once the page loads, DocScan works offline. The OCR language models are fetched on first use and cached by the browser. After that, you could put your phone in airplane mode and scan a full document to a searchable PDF without any network activity.&lt;br&gt;
This matters more than it sounds. You're scanning in a basement. On a plane. In a client meeting where pulling out your phone to use some cloud service looks unprofessional. DocScan doesn't care.&lt;/p&gt;

&lt;p&gt;Import From Your Gallery Too&lt;br&gt;
Don't have something in front of you to scan right now? You can import existing photos from your camera roll. DocScan runs the same edge detection and perspective correction on imported images, automatically applies the Document filter, and adds them to your page stack just like a live capture.&lt;br&gt;
This is useful for: photos you already took of a whiteboard, photos of receipts sent to you by someone else, or images of documents you photographed earlier without a scanning app.&lt;/p&gt;

&lt;p&gt;The Short Version&lt;br&gt;
If you need to turn a physical document into a PDF and you want it to be:&lt;/p&gt;

&lt;p&gt;Free&lt;br&gt;
Searchable&lt;br&gt;
Watermark-free&lt;br&gt;
Private (nothing uploaded)&lt;br&gt;
Available offline&lt;br&gt;
Multi-language&lt;br&gt;
Multi-page&lt;/p&gt;

&lt;p&gt;— DocScan does all of that. In your browser. Right now.&lt;br&gt;
No sign-up. No install. Just open it and scan.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How to Add Page Numbers to a PDF for Free — No Sign Up, No Watermark</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Sat, 28 Mar 2026 08:17:15 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/how-to-add-page-numbers-to-a-pdf-for-free-no-sign-up-no-watermark-366c</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/how-to-add-page-numbers-to-a-pdf-for-free-no-sign-up-no-watermark-366c</guid>
      <description>&lt;p&gt;If you've ever had to submit a report, share a proposal, or send a multi-page document to someone, you already know the quiet embarrassment of handing over an unnumbered PDF. Readers flip back and forth with no reference point. "Go to page 7" means nothing when there's no page 7 visible. Page numbers aren't decorative — they're functional.&lt;br&gt;
The problem? Most tools that let you add page numbers to a PDF online are either bloated with upsells, shove a watermark on every page, or demand you create an account before you can do anything useful.&lt;br&gt;
That's exactly the problem iHatePDF.cv was built to solve.&lt;/p&gt;

&lt;p&gt;Why Adding Page Numbers to a PDF Is Harder Than It Should Be&lt;br&gt;
Let's be honest about the PDF ecosystem for a second. It's dominated by a handful of giants — Adobe Acrobat at the premium end, and a pile of "free" tools in the middle that aren't really free. You go to edit a PDF and you're either paying $20/month or dealing with watermarks, file size limits, forced account registrations, and files being uploaded to servers you've never heard of.&lt;br&gt;
For something as simple as adding a page number to a document, none of that should be necessary.&lt;br&gt;
Searches like "free pdf editor no sign up", "add page numbers to pdf free", and "free online pdf editor without watermark" get tens of thousands of queries every month — and most of the results lead to tools that quietly disappoint. The free tier runs out. The watermark appears. The sign-up wall goes up.&lt;/p&gt;

&lt;p&gt;What iHatePDF.cv's Page Numbers Tool Actually Does&lt;br&gt;
The Add Page Numbers to PDF tool on iHatePDF.cv is refreshingly straightforward. You drop in your PDF, configure how you want the numbers to look, and download. That's it. No account. No watermark. No upload to a remote server.&lt;br&gt;
Here's what you can actually configure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Number Format
Not everyone wants plain "1, 2, 3." The tool gives you five formats to choose from:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;1, 2, 3 — the standard numeric format&lt;br&gt;
i, ii, iii — Roman numerals, great for front matter in academic documents or legal filings&lt;br&gt;
A, B, C — alphabetical, useful for appendices or exhibit labelling&lt;br&gt;
Page 1 — written-out format, common in formal reports&lt;br&gt;
1 / 10 — shows progress (current page out of total), good for presentations and manuals&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Position — Six Placements
You can place your page number in any of six spots on the page:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Top left, top center, top right&lt;br&gt;
Bottom left, bottom center, bottom right&lt;/p&gt;

&lt;p&gt;An interactive position picker in the UI makes this visual and intuitive — you click the spot on a mini page diagram, not a dropdown.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Color Customization
Seven presets (dark grey, black, red, blue, green, purple, orange) plus a full color picker if you want to match your brand or document style exactly.&lt;/li&gt;
&lt;li&gt;Font Size and Weight
Slider control from 6pt to 48pt, plus a toggle between regular and bold. Small touches, but they matter when the numbers need to blend into a polished document or stand out on a technical one.&lt;/li&gt;
&lt;li&gt;Start Number and Skip First Page
You can start numbering from any number — useful when your PDF is a chapter in a larger document that starts at page 34, for example. There's also a single checkbox to skip the first page, which is the right behavior for most cover-page documents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Privacy Angle: Files Never Leave Your Browser&lt;br&gt;
This is worth spelling out clearly because it's non-obvious and genuinely important.&lt;br&gt;
iHatePDF.cv processes everything locally in your browser using PDF-lib and PDF.js. Your file is never uploaded to any server. The page numbers are added entirely client-side, in memory, and the resulting file is downloaded directly to your device.&lt;br&gt;
For anyone working with confidential documents — legal contracts, financial reports, HR materials — this isn't a minor feature. It's the whole point. Tools that upload your files to process them are making a choice you may not be aware of. iHatePDF.cv made the opposite choice.&lt;/p&gt;

&lt;p&gt;Live Preview Before You Commit&lt;br&gt;
One thing that separates this tool from the more barebones free options is the live preview panel. As you adjust the format, position, color, and font size, the first five pages of your PDF update in real time to show you exactly where the numbers will appear and what they'll look like.&lt;br&gt;
This catches problems before they become problems. You can see immediately if "bottom-center" looks awkward on your particular layout, or if the font size you chose is too small to read comfortably.&lt;/p&gt;

&lt;p&gt;Who This Is Actually For&lt;br&gt;
The short answer: anyone who works with PDFs regularly and doesn't want to pay Adobe's subscription fee for a one-off formatting task.&lt;br&gt;
More specifically:&lt;br&gt;
Students and researchers — Academic papers, thesis documents, and reports almost universally require page numbers. Roman numerals for the front matter, Arabic numerals for the body is a classic format that this tool handles directly by letting you set a custom start number per section.&lt;br&gt;
Legal and compliance professionals — Legal documents need precise pagination. The ability to choose "Page 1" format or "1 / 10" totals, and to place numbers in a specific corner per organizational style guides, is practical here.&lt;br&gt;
Business users — Proposals, decks exported as PDFs, business reports handed off to clients. Page numbers make these documents look finished and professional.&lt;br&gt;
Developers and technical writers — Documentation and manuals are obvious candidates. The fact that this tool is entirely browser-based also makes it a clean fit for privacy-conscious technical environments.&lt;/p&gt;

&lt;p&gt;How It Compares to the Usual Alternatives&lt;br&gt;
ToolFree?Watermark?Upload Required?No Sign-Up?Adobe AcrobatPaidNoYesNoiLovePDFLimitedSometimesYesPartialSmallpdfLimitedSometimesYesPartialiHatePDFYesNoNoYes&lt;br&gt;
The positioning is deliberate. iHatePDF.cv isn't trying to compete on feature breadth with Acrobat. It's competing on the things that actually matter for most common PDF tasks: free, fast, private, no friction.&lt;/p&gt;

&lt;p&gt;Step-by-Step: Adding Page Numbers in About 30 Seconds&lt;/p&gt;

&lt;p&gt;Go to ihatepdf.cv/page-numbers&lt;br&gt;
Click "Choose PDF" or drag and drop your file&lt;br&gt;
Select your number format (1,2,3 / i,ii,iii / A,B,C / Page 1 / 1/10)&lt;br&gt;
Click your preferred position on the page diagram&lt;br&gt;
Adjust color, font size, and weight if needed&lt;br&gt;
Set your start number, toggle "skip first page" if you have a cover&lt;br&gt;
Click Add Numbers&lt;br&gt;
Your numbered PDF downloads instantly&lt;/p&gt;

&lt;p&gt;The live preview updates as you go, so by the time you hit the button, you already know exactly what you're getting.&lt;/p&gt;

&lt;p&gt;Final Thought&lt;br&gt;
The best tools get out of your way. They do the one thing they promise, they do it well, and they don't try to upsell you at every step or hold your file hostage behind a paywall.&lt;br&gt;
The iHatePDF.cv page numbers tool is exactly that. It's free, it's private, it works in your browser without an account, and it gives you more formatting control than most tools you'd actually pay for.&lt;br&gt;
If you regularly deal with PDFs — and in 2025, who doesn't — it's worth bookmarking.&lt;/p&gt;

&lt;p&gt;Try it at ihatepdf.cv/page-numbers — free, no sign up, no watermark.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I Built a PDF Splitter That Never Touches Your Files — Here's How</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Fri, 27 Mar 2026 16:20:50 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/i-built-a-pdf-splitter-that-never-touches-your-files-heres-how-96o</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/i-built-a-pdf-splitter-that-never-touches-your-files-heres-how-96o</guid>
      <description>&lt;p&gt;Why another PDF tool? Because privacy shouldn't be a premium feature.&lt;/p&gt;

&lt;p&gt;We've all been there. You have a 40-page PDF report, your client needs pages 12–18, and the only tools you can find either slap a watermark on the output, cap your file size, or — most troublingly — upload your document to some server you've never heard of before "processing" it.&lt;br&gt;
That last part should bother you more than it does.&lt;br&gt;
I built iHatePDF's Split PDF tool to solve exactly this problem. It's fast, it's free, it has no file size limit, and — most importantly — your files never leave your device. Not a single byte.&lt;br&gt;
Here's the full story.&lt;/p&gt;

&lt;p&gt;The Problem with Most Online PDF Tools&lt;br&gt;
Online PDF utilities are everywhere. But most of them share an uncomfortable architecture: you upload your file, it travels to a remote server, gets processed, and then gets sent back to you. Along the way, your document — which might contain contracts, medical records, financial data, or intellectual property — passes through infrastructure you don't control and likely don't understand.&lt;br&gt;
Some tools are upfront about this. Many are not. And even the honest ones that promise to "delete your file after 1 hour" are asking you to take their word for it.&lt;br&gt;
There's a better way.&lt;/p&gt;

&lt;p&gt;The Core Idea: Keep Everything in the Browser&lt;br&gt;
Modern browsers are incredibly capable. Thanks to APIs like the File API, Web Workers, and libraries like pdf-lib, you can do serious PDF manipulation entirely on the client side — no server required.&lt;br&gt;
This is exactly what the Split PDF tool does. When you upload a PDF:&lt;/p&gt;

&lt;p&gt;It's read into memory using the browser's FileReader API as an ArrayBuffer.&lt;br&gt;
pdf-lib parses the document and counts the pages.&lt;br&gt;
When you hit "Split PDF," the library creates new PDF documents, copies the relevant pages over, and generates downloadable byte arrays.&lt;br&gt;
Those byte arrays are handed directly to your browser's download mechanism.&lt;/p&gt;

&lt;p&gt;At no point does the file travel over a network. The "server" is your own CPU.&lt;br&gt;
javascriptconst buffer = await readFileAsArrayBuffer(file);&lt;br&gt;
const { PDFDocument } = window.PDFLib;&lt;br&gt;
const sourcePdf = await PDFDocument.load(buffer);&lt;br&gt;
Three lines. That's the entire "upload" step — and it never left your machine.&lt;/p&gt;

&lt;p&gt;Two Ways to Split&lt;br&gt;
The tool offers two splitting modes, designed to cover the most common real-world scenarios.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;By Page Ranges
This is the power-user mode. You type something like 1-5, 10-15, 20 and the tool produces three separate PDF files — one for each range. It's perfect for:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Breaking a book into chapters&lt;br&gt;
Extracting a specific section of a report to share with stakeholders&lt;br&gt;
Separating a combined invoice document into individual invoices&lt;/p&gt;

&lt;p&gt;The range parser handles both ranges (1-5) and single pages (20) in the same input string, separated by commas. Clean and flexible.&lt;br&gt;
javascriptconst parseRanges = (rangeStr) =&amp;gt; {&lt;br&gt;
  const result = [];&lt;br&gt;
  for (const part of rangeStr.split(",").map((s) =&amp;gt; s.trim())) {&lt;br&gt;
    if (part.includes("-")) {&lt;br&gt;
      const [start, end] = part.split("-").map((n) =&amp;gt; parseInt(n.trim()));&lt;br&gt;
      if (!isNaN(start) &amp;amp;&amp;amp; !isNaN(end) &amp;amp;&amp;amp; start &amp;lt;= end)&lt;br&gt;
        result.push({ start: start - 1, end: end - 1 });&lt;br&gt;
    } else {&lt;br&gt;
      const num = parseInt(part);&lt;br&gt;
      if (!isNaN(num)) result.push({ start: num - 1, end: num - 1 });&lt;br&gt;
    }&lt;br&gt;
  }&lt;br&gt;
  return result;&lt;br&gt;
};&lt;br&gt;
Notice the - 1 offsets — PDF page indices are zero-based internally, but humans count from 1. Small detail, big UX impact.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Individual Pages
One click, and every page becomes its own PDF file. A 20-page document turns into 20 separate downloads. This is ideal for distributing slides, archiving scanned documents page by page, or just giving someone exactly the one page they asked for.
To avoid overwhelming the browser's download manager, individual page downloads are staggered with a 100ms delay between each one — a tiny but important UX consideration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Handling Downloads Across Every Browser&lt;br&gt;
Cross-browser file downloads sound trivial until you try to support Safari on iOS. The tool uses a layered download helper that tries progressively more compatible approaches:&lt;/p&gt;

&lt;p&gt;A custom window.download function if available&lt;br&gt;
The legacy msSaveOrOpenBlob for old Edge/IE&lt;br&gt;
A programmatically clicked &lt;a&gt; tag (the standard modern approach)&lt;br&gt;
As a last resort, window.open() with a blob URL, with a prompt to allow pop-ups&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;iOS Safari gets a special callout UI element that reminds users to check their Files app → Downloads folder, since the native Safari download behavior differs from desktop browsers.&lt;br&gt;
javascriptconst isIOSSafari = () =&amp;gt; {&lt;br&gt;
  const ua = window.navigator.userAgent;&lt;br&gt;
  return (&lt;br&gt;
    (!!ua.match(/iPad/i) || !!ua.match(/iPhone/i)) &amp;amp;&amp;amp;&lt;br&gt;
    !!ua.match(/WebKit/i) &amp;amp;&amp;amp;&lt;br&gt;
    !ua.match(/CriOS/i)&lt;br&gt;
  );&lt;br&gt;
};&lt;br&gt;
Small touches like this are what separate a tool people actually use from a tool people abandon in frustration.&lt;/p&gt;

&lt;p&gt;Local History Without a Backend&lt;br&gt;
The tool tracks your recent splits using a hybrid storage strategy:&lt;/p&gt;

&lt;p&gt;Metadata (filename, tool used, timestamp, size) goes into localStorage — lightweight and fast.&lt;br&gt;
The actual PDF bytes go into IndexedDB — the browser's structured data store, capable of handling large binary blobs.&lt;/p&gt;

&lt;p&gt;This means you can revisit recent files without re-uploading anything, and it all works offline. No accounts. No sync. No data leaving your device.&lt;br&gt;
javascriptconst addToHistory = async (pdfData) =&amp;gt; {&lt;br&gt;
  const history = JSON.parse(localStorage.getItem(HISTORY_KEY) || "[]");&lt;br&gt;
  const newEntry = { id: Date.now(), name: pdfData.name, tool: pdfData.tool, ... };&lt;br&gt;
  if (pdfData.bytes) {&lt;br&gt;
    await dbSet(&lt;code&gt;pdf_${newEntry.id}&lt;/code&gt;, pdfData.bytes); // IndexedDB&lt;br&gt;
  }&lt;br&gt;
  history.unshift(newEntry);&lt;br&gt;
  localStorage.setItem(HISTORY_KEY, JSON.stringify(history.slice(0, 50)));&lt;br&gt;
};&lt;br&gt;
The history is capped at 50 entries to keep storage usage reasonable. Pragmatic, not precious.&lt;/p&gt;

&lt;p&gt;Quality Is Never Compromised&lt;br&gt;
One concern people often have with PDF tools is quality degradation. This tool sidesteps the issue entirely — because there is no re-rendering happening.&lt;br&gt;
pdf-lib doesn't rasterize your pages into images and repack them. It copies the raw PDF page objects from the source document into the new document. The text stays as text. The vector graphics stay as vectors. The embedded fonts stay embedded. You get a byte-perfect extraction, not a screenshot of a page.&lt;/p&gt;

&lt;p&gt;The Bigger Picture&lt;br&gt;
This tool is part of a broader philosophy: your files are yours. The browser has become powerful enough that the reflexive "upload to server → process → download" pipeline is often unnecessary, and increasingly, it's irresponsible.&lt;br&gt;
Client-side PDF processing is just one example. The same principle applies to image compression, document conversion, data parsing, and dozens of other utility tasks that millions of people outsource to servers every day when they don't have to.&lt;br&gt;
As developers, we have a responsibility to ask: does this actually need to leave the user's device? More often than the current ecosystem suggests, the answer is no.&lt;/p&gt;

&lt;p&gt;Try It&lt;br&gt;
If you've ever needed to split a PDF — and who hasn't — give it a try. Upload a file, split it, and notice what doesn't happen: no spinner while a server "processes" it, no email confirmation, no account wall, no watermark on the output.&lt;br&gt;
Just your PDF, split the way you want it, in seconds, entirely on your machine.&lt;br&gt;
That's what PDF tools should have been all along.&lt;/p&gt;

&lt;p&gt;Built with React, pdf-lib, and a deep distrust of unnecessary server round-trips. All processing happens locally in your browser — always.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I Built a PDF to Word Converter That Places Every Word at Its Exact Coordinates</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Thu, 26 Mar 2026 16:06:24 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/i-built-a-pdf-to-word-converter-that-places-every-word-at-its-exact-coordinates-2oia</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/i-built-a-pdf-to-word-converter-that-places-every-word-at-its-exact-coordinates-2oia</guid>
      <description>&lt;p&gt;PDF to Word conversion is a solved problem if you don't care about accuracy. Export the text, dump it into paragraphs, call it a docx. The result opens in Word and technically contains the words from the original document, arranged in ways that bear no resemblance to the original layout.&lt;/p&gt;

&lt;p&gt;Doing it accurately is a different problem entirely.&lt;/p&gt;

&lt;p&gt;The PDF to Word converter inside &lt;a href="https://www.ihatepdf.cv/pdf-to-word" rel="noopener noreferrer"&gt;ihatepdf.cv&lt;/a&gt; has two modes. &lt;strong&gt;Pixel Perfect&lt;/strong&gt; — every page becomes a high-resolution JPEG embedded in the docx, visually identical to the original, text not editable. &lt;strong&gt;Ultra-Accurate Editable&lt;/strong&gt; — every word extracted from the PDF's internal structure with its exact X/Y coordinates, font size, font family, bold, and italic state, then placed as an absolutely-positioned text box in the docx at its precise location. Both run entirely in your browser. No upload.&lt;/p&gt;

&lt;p&gt;Here's how the accurate mode works.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a PDF actually stores
&lt;/h2&gt;

&lt;p&gt;A PDF is not a document format in the way Word is. It is closer to a list of drawing instructions. When a PDF renderer encounters text, it executes a series of PostScript-derived commands that describe exactly where to place each glyph on the page.&lt;/p&gt;

&lt;p&gt;The key data structure is the &lt;strong&gt;text transform matrix&lt;/strong&gt;. Every text item in a PDF has a 6-element array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[a, b, c, d, tx, ty]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an affine transformation matrix. For horizontal text (which is most text), &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;d&lt;/code&gt; are the horizontal and vertical scale factors, and &lt;code&gt;tx&lt;/code&gt;, &lt;code&gt;ty&lt;/code&gt; are the translation — the absolute position of the character on the page in PDF points.&lt;/p&gt;

&lt;p&gt;The font size can be extracted directly from the matrix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ty&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// actual rendered size in pts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Math.sqrt(a² + b²)&lt;/code&gt; gives the magnitude of the horizontal basis vector — which is exactly the rendered font size accounting for any scaling applied to the text state.&lt;/p&gt;

&lt;p&gt;The Y coordinate needs coordinate system conversion. PDFs use a bottom-left origin; HTML, CSS, and Word use top-left. The conversion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;ty&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// flip Y axis&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us the top-left corner of the text in the PDF's point coordinate system, which is what we need to place it in Word.&lt;/p&gt;




&lt;h2&gt;
  
  
  Extracting the full text layer
&lt;/h2&gt;

&lt;p&gt;The extraction function reads every text item from PDF.js's &lt;code&gt;getTextContent()&lt;/code&gt; API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractNativeTextItems&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfPage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pdfPage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTextContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;includeMarkedContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pdfPage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getViewport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ty&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;ty&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;// flip Y for top-left origin&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;fontName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fontName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;hasEOL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasEOL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;pageW&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;pageH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;item.fontName&lt;/code&gt; from PDF.js is the PDF's internal font identifier — typically something like &lt;code&gt;BCDEAA+Arial-BoldItalic&lt;/code&gt; or &lt;code&gt;g_d0_f1&lt;/code&gt;. The six-character prefix before the &lt;code&gt;+&lt;/code&gt; is a PDF subset tag that can be ignored; everything after it is the actual font information.&lt;/p&gt;




&lt;h2&gt;
  
  
  Font detection from internal names
&lt;/h2&gt;

&lt;p&gt;PDF font names encode style information in their naming patterns. Bold and italic are detected with regex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;detectFontStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fontName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fontName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isBold&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/bold|heavy|black|semibold|demi/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isItalic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/italic|oblique|slanted/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;isBold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isItalic&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The font family name is normalised through a lookup table that maps PDF internal names to Word-compatible font families:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FONT_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/times|tmnr|nimbus rom/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Times New Roman&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/arial|helvetica|swiss/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Arial&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/courier/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Courier New&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/georgia/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Georgia&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/verdana/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Verdana&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/calibri/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Calibri&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/cambria/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cambria&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/palatino|palladio/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Palatino Linotype&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normaliseFont&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fallbackFont&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;rawName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;fallbackFont&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;re&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mapped&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;FONT_MAP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawName&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;mapped&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// Strip subset prefix, clean up separators&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rawName&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;A-Z&lt;/span&gt;&lt;span class="se"&gt;]{6}\+&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-_,&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;clean&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;fallbackFont&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the font can't be matched, the user's chosen fallback font (Calibri by default) is used. This handles the common case of PDF fonts with completely opaque internal names.&lt;/p&gt;




&lt;h2&gt;
  
  
  Line clustering — grouping spans that belong together
&lt;/h2&gt;

&lt;p&gt;PDF.js returns text as individual spans, not lines. A single visual line of text might be 20 separate spans if the PDF uses different fonts, sizes, or colors mid-line. To place them sensibly in Word, they need to be grouped into lines first.&lt;/p&gt;

&lt;p&gt;The clustering algorithm compares vertical overlap between consecutive spans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;groupIntoLines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prevMid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;curMid&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt;  &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;minH&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prevMid&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;curMid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;minH&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;minH&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After grouping, adjacent spans on the same line that have matching font properties are merged with a space if the gap between them warrants one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;mergeAdjacentSpans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mergeGapThreshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}];&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gap&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sameFontSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sameBold&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isBold&lt;/span&gt;   &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isBold&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sameItalic&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isItalic&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isItalic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;mergeGapThreshold&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;sameFontSize&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;sameBold&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;sameItalic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;cur&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Converting coordinates to EMU
&lt;/h2&gt;

&lt;p&gt;Word's internal coordinate system uses &lt;strong&gt;EMU — English Metric Units&lt;/strong&gt;. One point equals 12,700 EMU. One inch equals 914,400 EMU. This is the coordinate space that Word uses for absolute positioning of floating elements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_EMU&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12700&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// For each text item:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;xEmu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_EMU&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;yEmu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_EMU&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;wEmu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_EMU&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hEmu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_EMU&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The page dimensions also need to be converted for the DOCX section properties. PDF uses points for page dimensions; Word uses twips (twentieths of a point):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_TWIPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pgWTwips&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pageWPt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_TWIPS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pgHTwips&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pageHPt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;PT_TO_TWIPS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures the output DOCX has exactly the same page dimensions as the original PDF — no layout shift.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the DOCX anchored text boxes
&lt;/h2&gt;

&lt;p&gt;Each text item becomes a &lt;code&gt;&amp;lt;wp:anchor&amp;gt;&lt;/code&gt; — an absolutely-positioned floating text box in OOXML (the XML format underlying docx files). The anchor is positioned relative to the page at the exact EMU coordinates extracted from the PDF:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildAnchoredTextBox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;xEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;yEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;wEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;hEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageWEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageHEmu&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Clamp to page bounds — prevents out-of-range crashes in Word&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageWEmu&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;91440&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// 91440 EMU = min 1pt&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;yEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageHEmu&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;91440&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;91440&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;wEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageWEmu&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;safeX&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;91440&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hEmu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageHEmu&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;safeY&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;wp:anchor relativeFrom="page"&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;wp:positionH relativeFrom="page"&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`  &amp;lt;wp:posOffset&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;safeX&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/wp:posOffset&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;/wp:positionH&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;wp:positionV relativeFrom="page"&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`  &amp;lt;wp:posOffset&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;safeY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/wp:posOffset&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;/wp:positionV&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;wp:extent cx="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;safeW&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" cy="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;safeH&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"/&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;wp:wrapNone/&amp;gt;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="c1"&gt;// ... text box content with runs ...&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;/wp:anchor&amp;gt;`&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;&amp;lt;wp:wrapNone/&amp;gt;&lt;/code&gt; means the text box doesn't affect text flow — it floats freely at its absolute position, exactly like the original PDF content. &lt;code&gt;relativeFrom="page"&lt;/code&gt; anchors the coordinates to the page origin rather than the text area, which is essential because PDFs use page-relative coordinates.&lt;/p&gt;

&lt;p&gt;The bounds clamping (91440 EMU = approximately 0.1 points) prevents a common crash in Word where text boxes positioned outside the page area cause the file to be reported as corrupted.&lt;/p&gt;




&lt;h2&gt;
  
  
  The OCR fallback for scanned PDFs
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;extractNativeTextItems()&lt;/code&gt; returns fewer than 5 text items — which happens with scanned documents, image-only PDFs, or PDFs where text was saved as outlines — the tool switches to Tesseract.js OCR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nativeResult&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;nativeResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// use native extraction&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// render page at 3× scale for OCR&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vpScaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getViewport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RENDER_SCALE&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// RENDER_SCALE = 3&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;canvas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vpScaled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vpScaled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;canvasContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vpScaled&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tessData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ocrPageCanvas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ocrLang&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onProgress&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// release GPU memory&lt;/span&gt;

  &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tessWordsToItems&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tessData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RENDER_SCALE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageWPt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pageHPt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rendering at 3× (216 DPI equivalent) before OCR significantly improves Tesseract accuracy compared to rendering at 1× or 2×. The 3× rendered pixel coordinates are then divided back by the render scale to get PDF point coordinates, which are then converted to EMU for placement in the docx.&lt;/p&gt;

&lt;p&gt;Tesseract provides word-level bounding boxes (&lt;code&gt;word.bbox&lt;/code&gt;) with confidence scores. Words with confidence below 15% are discarded — they're almost certainly noise or artifacts rather than real text.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pixel Perfect mode — for when visual fidelity matters more than editability
&lt;/h2&gt;

&lt;p&gt;For PDFs where layout accuracy matters more than editability — forms, certificates, complex multi-column layouts — the pixel pipeline renders each page as a JPEG and embeds it inline in the docx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;canvasToJpeg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.93&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/jpeg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;quality&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;blobToUint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// GPU memory release&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JPEG quality options map to &lt;code&gt;0.82&lt;/code&gt; (balanced), &lt;code&gt;0.93&lt;/code&gt; (high, default), and &lt;code&gt;0.97&lt;/code&gt; (maximum). High quality at 2× render scale produces JPEG images that are visually indistinguishable from the original PDF when viewed at normal sizes.&lt;/p&gt;

&lt;p&gt;Each page image is embedded using &lt;code&gt;&amp;lt;wp:inline&amp;gt;&lt;/code&gt; rather than &lt;code&gt;&amp;lt;wp:anchor&amp;gt;&lt;/code&gt; — inline drawing elements in OOXML flow with the document and don't overlap, which is what you want when each page is the full-width content of a section.&lt;/p&gt;




&lt;h2&gt;
  
  
  The privacy architecture
&lt;/h2&gt;

&lt;p&gt;The entire pipeline — PDF parsing, text extraction, OCR, DOCX assembly — runs locally in the browser. No bytes of your document touch any server.&lt;/p&gt;

&lt;p&gt;For sensitive PDFs — contracts, financial statements, medical records, legal filings — this matters. The document you upload to a conversion service goes somewhere. It sits on a server. It may be retained. With &lt;a href="https://www.ihatepdf.cv/pdf-to-word" rel="noopener noreferrer"&gt;ihatepdf.cv&lt;/a&gt;, the conversion happens in your browser tab and the result goes directly to your Downloads folder. The file never leaves your device.&lt;/p&gt;

&lt;p&gt;Open DevTools → Network tab → convert a PDF. You'll see PDF.js and Tesseract.js loading once and being cached by the service worker. You'll see zero upload requests for your document.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to use which mode
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ultra-Accurate Editable&lt;/strong&gt; is the right choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to edit the converted document — update names, fix typos, change dates&lt;/li&gt;
&lt;li&gt;The PDF was generated digitally (not scanned) — Word, InDesign, Google Docs exports&lt;/li&gt;
&lt;li&gt;You want selectable, copyable, searchable text in the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pixel Perfect&lt;/strong&gt; is the right choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Layout accuracy matters more than editability — forms, certificates, designed documents&lt;/li&gt;
&lt;li&gt;The PDF has complex layouts that are difficult to reconstruct with text boxes&lt;/li&gt;
&lt;li&gt;You want guaranteed visual fidelity and don't need to edit the content&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.ihatepdf.cv/pdf-to-word" rel="noopener noreferrer"&gt;ihatepdf.cv/pdf-to-word&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free. No account. No upload. No watermark. Both conversion modes available for every file, no paywalls.&lt;/p&gt;

&lt;p&gt;If you work with PDFs professionally and have conversion cases that break — unusual layouts, complex tables, right-to-left scripts, mathematical content — I read comments. Edge cases are how the tool improves.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of an ongoing series on building a privacy-first PDF toolkit in the browser. The architecture overview is at &lt;a href="https://www.ihatepdf.cv/technical-blog" rel="noopener noreferrer"&gt;ihatepdf.cv/technical-blog&lt;/a&gt;. Previous posts: &lt;a href="https://www.ihatepdf.cv/compress-pdf" rel="noopener noreferrer"&gt;PDF compression with Ghostscript-WASM&lt;/a&gt; · &lt;a href="https://www.ihatepdf.cv/pdf-to-jpg" rel="noopener noreferrer"&gt;PDF to JPG at 600 DPI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How I Built a PDF to JPG Converter That Renders at 600 DPI Inside a Browser Tab</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Wed, 25 Mar 2026 16:14:08 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/how-i-built-a-pdf-to-jpg-converter-that-renders-at-600-dpi-inside-a-browser-tab-111e</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/how-i-built-a-pdf-to-jpg-converter-that-renders-at-600-dpi-inside-a-browser-tab-111e</guid>
      <description>&lt;p&gt;The complete engineering story behind high-resolution PDF-to-image conversion using PDF.js, canvas memory management, and device-adaptive processing&lt;/p&gt;

&lt;p&gt;Most online PDF to JPG converters cap output at 150 DPI. Some go to 300 DPI if you pay. Very few reach 600 DPI, and those that do require uploading your file to their servers.&lt;br&gt;
The PDF to JPG converter inside ihatepdf.cv supports up to 600 DPI output — completely free, with zero server upload. Every pixel is rendered locally in your browser. Here's exactly how it works, why 600 DPI matters, and what engineering problems had to be solved to make it work on devices ranging from an iPhone to a 32GB workstation.&lt;/p&gt;

&lt;p&gt;Why DPI matters for PDF to image conversion&lt;br&gt;
DPI stands for dots per inch — it describes how many pixels represent each inch of the original document.&lt;br&gt;
72 DPI is the browser's base resolution. One CSS pixel equals one device pixel at 1× zoom. This is what you get from a naive canvas.toBlob() call without any scaling. Fine for a thumbnail. Terrible for anything else.&lt;br&gt;
150 DPI is adequate for screen viewing and social media. Text is sharp enough to read. Images look acceptable. File sizes are reasonable.&lt;br&gt;
300 DPI is the standard for professional printing. Business cards, brochures, and office documents are typically printed at 300 DPI. This is what most professional tools default to.&lt;br&gt;
600 DPI is for archival purposes, large-format printing, and situations where you need to zoom into the output image and still see crisp detail — scanning workflows, medical records, engineering drawings, high-resolution reproductions.&lt;br&gt;
The way ihatepdf.cv achieves these DPI targets is by treating DPI as a scale multiplier from the browser's 72 DPI base:&lt;br&gt;
javascriptconst dpiToScale = (dpi) =&amp;gt; dpi / 72;&lt;/p&gt;

&lt;p&gt;// 150 DPI → 2.08x scale&lt;br&gt;
// 300 DPI → 4.17x scale&lt;br&gt;
// 600 DPI → 8.33x scale&lt;br&gt;
That 8.33× scale at 600 DPI is where the engineering gets interesting.&lt;/p&gt;

&lt;p&gt;The canvas size problem&lt;br&gt;
Browsers impose a hard limit on canvas dimensions: 16,384 pixels on most modern browsers (Chrome, Firefox, Safari). At 8.33× scale, a standard A4 PDF page (595 × 842 points at 72 DPI) becomes:&lt;br&gt;
595 × 8.33 = 4,956 px wide&lt;br&gt;
842 × 8.33 = 7,014 px tall&lt;br&gt;
That's within the 16,384 limit for a standard page. But a legal-size document, a wide-format architectural drawing, or a landscape slide deck at 600 DPI can easily exceed it.&lt;br&gt;
The solution is getOptimalScale():&lt;br&gt;
javascriptconst getOptimalScale = (viewport, requestedScale) =&amp;gt; {&lt;br&gt;
  const maxDimension = 16384;&lt;br&gt;
  const testWidth  = viewport.width  * requestedScale;&lt;br&gt;
  const testHeight = viewport.height * requestedScale;&lt;/p&gt;

&lt;p&gt;if (testWidth &amp;gt; maxDimension || testHeight &amp;gt; maxDimension) {&lt;br&gt;
    const scaleFactor = Math.min(&lt;br&gt;
      maxDimension / viewport.width,&lt;br&gt;
      maxDimension / viewport.height&lt;br&gt;
    );&lt;br&gt;
    return scaleFactor * 0.95; // 5% safety margin&lt;br&gt;
  }&lt;br&gt;
  return requestedScale;&lt;br&gt;
};&lt;br&gt;
Before rendering any page, the tool calculates whether the requested scale would exceed the canvas limit. If it would, it automatically reduces the scale to the maximum safe value for that specific page's dimensions. The 5% safety margin accounts for browsers that enforce 16,383 rather than 16,384.&lt;/p&gt;

&lt;p&gt;Device pixel ratio — the hidden multiplier&lt;br&gt;
Modern screens have device pixel ratios above 1×. A MacBook Pro Retina display is 2×. Some Android phones are 3×. ihatepdf.cv accounts for this:&lt;br&gt;
javascriptconst renderPageToCanvas = async (page, targetScale) =&amp;gt; {&lt;br&gt;
  const viewport      = page.getViewport({ scale: targetScale });&lt;br&gt;
  const pixelRatio    = Math.min(window.devicePixelRatio || 1, 2); // cap at 2×&lt;/p&gt;

&lt;p&gt;canvas.width  = Math.floor(viewport.width  * pixelRatio);&lt;br&gt;
  canvas.height = Math.floor(viewport.height * pixelRatio);&lt;/p&gt;

&lt;p&gt;const ctx = canvas.getContext('2d', { &lt;br&gt;
    alpha: false,            // white background — JPEG has no alpha channel&lt;br&gt;
    willReadFrequently: false &lt;br&gt;
  });&lt;/p&gt;

&lt;p&gt;ctx.fillStyle = 'white';&lt;br&gt;
  ctx.fillRect(0, 0, canvas.width, canvas.height);&lt;br&gt;
  ctx.scale(pixelRatio, pixelRatio);&lt;br&gt;
  ctx.imageSmoothingEnabled  = true;&lt;br&gt;
  ctx.imageSmoothingQuality  = 'high';&lt;/p&gt;

&lt;p&gt;await page.render({ &lt;br&gt;
    canvasContext: ctx, &lt;br&gt;
    viewport,&lt;br&gt;
    intent: 'print',           // not 'display' — higher quality rendering&lt;br&gt;
    enableWebGL: false,&lt;br&gt;
    renderInteractiveForms: false,&lt;br&gt;
  }).promise;&lt;/p&gt;

&lt;p&gt;return canvas;&lt;br&gt;
};&lt;br&gt;
Three things worth noting here:&lt;br&gt;
alpha: false — PDFs have no transparent background. Setting alpha to false avoids the browser creating an alpha channel it never needs, saving memory.&lt;br&gt;
intent: 'print' — PDF.js has two rendering intents: display and print. Print intent uses higher-quality glyph rendering and anti-aliasing, which produces noticeably sharper text especially at high DPI.&lt;br&gt;
pixelRatio capped at 2× — Going to 3× on a high-DPI phone would triple memory usage for a visual improvement that's imperceptible at normal viewing sizes. The cap prevents memory exhaustion on mobile.&lt;/p&gt;

&lt;p&gt;Memory management — the real challenge&lt;br&gt;
This is what separates a tool that actually works from one that crashes your browser tab.&lt;br&gt;
At 600 DPI, each A4 page canvas uses approximately:&lt;br&gt;
4,956 × 7,014 px × 4 bytes (RGBA) = ~139 MB of RAM&lt;br&gt;
Plus an equivalent amount of GPU texture memory for the canvas. Plus the PDF.js rendering buffers. For a 50-page document at 600 DPI, the naive approach allocates ~7 GB — which immediately crashes any browser.&lt;br&gt;
The solution is explicit canvas disposal after each page:&lt;br&gt;
javascriptconst canvas = await renderPageToCanvas(page, optimizedScale);&lt;/p&gt;

&lt;p&gt;// ... create blob, trigger download ...&lt;/p&gt;

&lt;p&gt;canvas.width  = 0;  // ← releases GPU texture memory immediately&lt;br&gt;
canvas.height = 0;&lt;br&gt;
// canvas goes out of scope → GC collects RAM&lt;br&gt;
Setting canvas dimensions to zero is not obvious. Simply removing the canvas reference doesn't immediately release GPU memory in most browsers — the GPU texture allocation persists until the browser's garbage collector runs, which can be seconds later. Setting width and height to zero forces immediate GPU memory deallocation.&lt;br&gt;
Between batches, the tool adds a deliberate 2-second pause:&lt;br&gt;
javascriptif (batchIndex &amp;lt; batches.length - 1) {&lt;br&gt;
  await new Promise(resolve =&amp;gt; setTimeout(resolve, 2000));&lt;br&gt;
  if (window.gc) window.gc(); // hint — browser may ignore&lt;br&gt;
}&lt;br&gt;
Chrome's garbage collector typically triggers after ~1–1.5 seconds of idle time. The 2-second pause gives it time to run and reclaim memory before the next batch begins.&lt;/p&gt;

&lt;p&gt;Device-adaptive limits&lt;br&gt;
The same code runs on a 2GB RAM phone and a 32GB workstation. Rather than applying one-size-fits-all limits, ihatepdf.cv detects device capabilities and adjusts automatically:&lt;br&gt;
javascriptconst getDeviceCapabilities = () =&amp;gt; {&lt;br&gt;
  const isMobile    = /Android|iPhone/i.test(navigator.userAgent);&lt;br&gt;
  const isTablet    = /(tablet|ipad)/i.test(navigator.userAgent);&lt;br&gt;
  const deviceMem   = navigator.deviceMemory || 4; // not available in Safari&lt;/p&gt;

&lt;p&gt;if (isMobile &amp;amp;&amp;amp; screen.width &amp;lt; 768) {&lt;br&gt;
    return { maxFileSize: 50  * 1024 * 1024, maxDPI: 300, maxPagesPerBatch: 10  };&lt;br&gt;
  }&lt;br&gt;
  if (isTablet) {&lt;br&gt;
    return { maxFileSize: 75  * 1024 * 1024, maxDPI: 450, maxPagesPerBatch: 25  };&lt;br&gt;
  }&lt;br&gt;
  if (deviceMem &amp;lt; 4) {&lt;br&gt;
    return { maxFileSize: 100 * 1024 * 1024, maxDPI: 450, maxPagesPerBatch: 30  };&lt;br&gt;
  }&lt;br&gt;
  return       { maxFileSize: 150 * 1024 * 1024, maxDPI: 600, maxPagesPerBatch: 50  };&lt;br&gt;
};&lt;br&gt;
A phone user still gets PDF to JPG conversion — just capped at 300 DPI and 10 pages per batch instead of 600 DPI and 50 pages. They get the tool, scaled to what their device can handle.&lt;br&gt;
Before any large conversion, memory usage is estimated:&lt;br&gt;
javascriptconst estimateMemoryUsage = (fileSize, pageCount, scale, format) =&amp;gt; {&lt;br&gt;
  const baseMemoryPerPage = 5 * 1024 * 1024;          // 5 MB at scale 1.0&lt;br&gt;
  const scaleFactor       = Math.pow(scale, 2);        // quadratic: 2× scale = 4× memory&lt;br&gt;
  const formatMultiplier  = format === 'png' ? 1.5 : 1.0;&lt;br&gt;
  const estimated         = pageCount * baseMemoryPerPage * scaleFactor * formatMultiplier;&lt;br&gt;
  return { estimated, withSafety: estimated * 1.5 };&lt;br&gt;
};&lt;br&gt;
Memory scales quadratically with DPI — doubling DPI quadruples memory usage because both dimensions double. A user trying to convert 50 pages at 600 DPI sees a warning before the browser runs out of memory rather than a silent crash.&lt;/p&gt;

&lt;p&gt;JPEG vs PNG — when to use each&lt;br&gt;
ihatepdf.cv supports both output formats. The choice matters:&lt;br&gt;
JPEG — lossy compression. Adjustable quality from 60% to 100%. 300 DPI JPEG at 85% quality is typically 500KB–2MB per page. The right choice for photos, scanned documents, presentations — anything where some quality loss is imperceptible in practice.&lt;br&gt;
PNG — lossless compression. No quality setting. 300 DPI PNG is typically 3–8MB per page. The right choice for documents with sharp lines, text-heavy pages, technical diagrams, or any situation where pixel-perfect reproduction is required.&lt;br&gt;
For archival purposes at 600 DPI, PNG is almost always the right answer — the file sizes are large but the quality guarantee is absolute.&lt;/p&gt;

&lt;p&gt;The four DPI presets and what they're for&lt;br&gt;
Rather than exposing a raw DPI slider and leaving users to guess, ihatepdf.cv maps to four practical use cases:&lt;br&gt;
PresetDPIJPEG QualityBest forWeb15085%Social media, email, web embeddingPrint30095%Office printing, CVs, brochuresProfessional50098%High-end printing, detailed documentsArchival600100%Maximum quality, large format, archiving&lt;br&gt;
The Archival preset is only shown on devices that can handle it — it is hidden on phones and tablets where it would cause memory issues.&lt;/p&gt;

&lt;p&gt;Privacy — verifiable, not just claimed&lt;br&gt;
The entire conversion pipeline runs locally:&lt;br&gt;
javascript// The complete data lifecycle — no network&lt;br&gt;
FileReader.readAsArrayBuffer(file)     // → browser memory&lt;br&gt;
  → pdfjsLib.getDocument({ data })    // → PDF.js processing (local)&lt;br&gt;
  → page.render({ canvasContext })    // → Canvas API (local)&lt;br&gt;
  → canvas.toBlob('image/jpeg')       // → Blob in memory&lt;br&gt;
  → URL.createObjectURL(blob)         // → local object URL&lt;br&gt;
  → anchor.click()                    // → device storage&lt;br&gt;
// Zero network requests for file data&lt;br&gt;
Open DevTools → Network tab → convert a PDF to JPG. The upload column shows 0 bytes for your document. Not a policy. Not a claim. A verifiable architectural fact.&lt;/p&gt;

&lt;p&gt;Try it&lt;br&gt;
ihatepdf.cv/pdf-to-jpg&lt;br&gt;
Free. No account. No upload. No watermark on output. Supports JPEG and lossless PNG up to 600 DPI — the same output quality as tools that charge $20/month, running entirely in your browser tab.&lt;br&gt;
If you process high-resolution documents professionally and have questions about the implementation, or if you find edge cases I haven't handled yet, I read comments.&lt;/p&gt;

&lt;p&gt;Part of an ongoing series on building a privacy-first PDF toolkit entirely in the browser. The full architecture overview is at ihatepdf.cv/technical-blog. The compression deep-dive is at ihatepdf.cv/compress-pdf.ShareContentimport React, {&lt;br&gt;
  useState,&lt;br&gt;
  useEffect,&lt;br&gt;
  useRef,&lt;br&gt;
  useCallback,&lt;br&gt;
  Suspense,&lt;br&gt;
} from "react";&lt;br&gt;
import { useNavigate, useLocation } from "react-router-dom";&lt;br&gt;
import ClarityAnalytics from "./ClarityAnalytics";&lt;br&gt;
import {&lt;br&gt;
  ChevronDown,&lt;br&gt;
  ArrowRight,&lt;br&gt;
  ChevronUp,&lt;br&gt;
  LayoutDashboard,&lt;br&gt;
  Sparkles,&lt;br&gt;
} from "lucipastedimport React, { useState, useEffect } from "react";&lt;br&gt;
import {&lt;br&gt;
  FileText,&lt;br&gt;
  Cpu,&lt;br&gt;
  HardDrive,&lt;br&gt;
  Zap,&lt;br&gt;
  Shield,&lt;br&gt;
  ChevronRight,&lt;br&gt;
  Code,&lt;br&gt;
  Database,&lt;br&gt;
  Layers,&lt;br&gt;
  Brain,&lt;br&gt;
  Lock,&lt;br&gt;
  Activity,&lt;br&gt;
  MessageSquare,&lt;br&gt;
  Key,&lt;br&gt;
} from "lucide-react";&lt;/p&gt;

&lt;p&gt;function Blog() {&lt;br&gt;
  const [activeSection, setActiveSection] = useStpasted#!/usr/bin/env node&lt;br&gt;
/**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate-seo-pages.js  — ihatepdf.cv&lt;/li&gt;
&lt;li&gt;Generates unique index.html for every tool route + static blog pages.&lt;/li&gt;
&lt;li&gt;BUILD: "vite build &amp;amp;&amp;amp; node generate-seo-pages.js"
*/
import fs   from 'fs'
import path from 'path'
import { fileURLToPath } from 'url'
import { POSTS } from '.pasted&amp;lt;!DOCTYPE html&amp;gt;




&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;lt;link rel="preconnect"   href="&lt;a href="https://unpkg.com" rel="noopener noreferrer"&gt;https://unpkg.com&lt;/a&gt;"            crossoriginpasted// blog-posts.js&lt;br&gt;
// ─────────────────────────────────────────────────────────────────&lt;br&gt;
// All blog posts for ihatepdf.cv&lt;br&gt;
// Each post supports: slug, title, description, date, readTime,&lt;br&gt;
// keywords, content, relatedPosts, relatedTools&lt;br&gt;
// ───────────────────────────────────────────────────────────────pasted#!/usr/bin/env node&lt;br&gt;
/**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate-seo-pages.js  — ihatepdf.cv&lt;/li&gt;
&lt;li&gt;Generates unique index.html for every tool route + static blog pages.&lt;/li&gt;
&lt;li&gt;BUILD: "vite build &amp;amp;&amp;amp; node generate-seo-pages.js"
&lt;em&gt;/
import fs   from 'fs'
import path from 'path'
import { fileURLToPath } from 'url'
import { POSTS } from '.pasted#!/usr/bin/env node
/&lt;/em&gt;*&lt;/li&gt;
&lt;li&gt;generate-seo-pages.js  — ihatepdf.cv&lt;/li&gt;
&lt;li&gt;Generates unique index.html for every tool route + static blog pages.&lt;/li&gt;
&lt;li&gt;BUILD: "vite build &amp;amp;&amp;amp; node generate-seo-pages.js"
*/
import fs   from 'fs'
import path from 'path'
import { fileURLToPath } from 'url'
import { POSTS } from '.pastedimport React, { useState, useEffect } from "react";&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;const readFileAsArrayBuffer = (file) =&amp;gt; {&lt;br&gt;
  return new Promise((resolve, reject) =&amp;gt; {&lt;br&gt;
    const reader = new FileReader();&lt;br&gt;
    reader.onload = () =&amp;gt; resolve(reader.result);&lt;br&gt;
    reader.onerror = reject;&lt;br&gt;
    reader.readAsArrayBuffer(file);&lt;br&gt;
  });&lt;br&gt;
};&lt;br&gt;
pasted1 Blogger &lt;a href="https://www.blogger.com" rel="noopener noreferrer"&gt;https://www.blogger.com&lt;/a&gt; 100&lt;br&gt;
2   WordPress.com   &lt;a href="https://wordpress.com" rel="noopener noreferrer"&gt;https://wordpress.com&lt;/a&gt;   100&lt;br&gt;
3   LinkedIn Articles   &lt;a href="https://www.linkedin.com" rel="noopener noreferrer"&gt;https://www.linkedin.com&lt;/a&gt;    99&lt;br&gt;
4   Reddit (Article Submissions)    &lt;a href="https://www.reddit.com" rel="noopener noreferrer"&gt;https://www.reddit.com&lt;/a&gt;  99&lt;br&gt;
5   Google Sites    &lt;a href="https://sites.google.com" rel="noopener noreferrer"&gt;https://sites.google.com&lt;/a&gt;    98&lt;br&gt;
6   Medium  &lt;a href="https://medium.com" rel="noopener noreferrer"&gt;https://medium.com&lt;/a&gt;  96&lt;br&gt;
7   GitHub Pages    &lt;a href="https://pages.github.com" rel="noopener noreferrer"&gt;https://pages.github.com&lt;/a&gt;    9pasted&amp;lt;!DOCTYPE html&amp;gt;&lt;br&gt;
&lt;br&gt;
&lt;/p&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  


&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  


&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  

&lt;p&gt;&amp;lt;link rel="preconnect"   href="&lt;a href="https://unpkg.com" rel="noopener noreferrer"&gt;https://unpkg.com&lt;/a&gt;"            crossoriginpasted#!/usr/bin/env node&lt;br&gt;
/**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate-seo-pages.js  — ihatepdf.cv&lt;/li&gt;
&lt;li&gt;Generates unique index.html for every tool route + static blog pages.&lt;/li&gt;
&lt;li&gt;BUILD: "vite build &amp;amp;&amp;amp; node generate-seo-pages.js"
*/
import fs   from 'fs'
import path from 'path'
import { fileURLToPath } from 'url'
import { POSTS } from '.pastedimport React, { useState, useEffect, useRef } from "react";&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;// ── Scroll progress hook ──────────────────────────────────────────────────────&lt;br&gt;
function useScrollProgress() {&lt;br&gt;
  const [progress, setProgress] = useState(0);&lt;br&gt;
  useEffect(() =&amp;gt; {&lt;br&gt;
    const onScroll = () =&amp;gt; {&lt;br&gt;
      const total = document.dpasted#!/usr/bin/env node&lt;br&gt;
/**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate-seo-pages.js  — ihatepdf.cv&lt;/li&gt;
&lt;li&gt;Generates unique index.html for every tool route + static blog pages.&lt;/li&gt;
&lt;li&gt;BUILD: "vite build &amp;amp;&amp;amp; node generate-seo-pages.js"
*/
import fs   from 'fs'
import path from 'path'
import { fileURLToPath } from 'url'
import { POSTS } from '.pastedimport React, { useState, useEffect } from "react";&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;const IDB_STORE_NAME = "ihatepdf-store";&lt;br&gt;
const IDB_DB_NAME = "ihatepdf_DB";&lt;br&gt;
const HISTORY_KEY = "ihatepdf_history";&lt;/p&gt;

&lt;p&gt;const initDB = () =&amp;gt; {&lt;br&gt;
  return new Promise((resolve, reject) =&amp;gt; {&lt;br&gt;
    const request = indexedDB.open(IDB_DB_NAME, 1);&lt;br&gt;
    requestpastedimport React, { useState, useEffect } from "react";&lt;/p&gt;

&lt;p&gt;const IDB_STORE_NAME = "ihatepdf-store";&lt;br&gt;
const IDB_DB_NAME = "ihatepdf_DB";&lt;br&gt;
const HISTORY_KEY = "ihatepdf_history";&lt;/p&gt;

&lt;p&gt;const readFileAsArrayBuffer = (file) =&amp;gt; {&lt;br&gt;
  return new Promise((resolve, reject) =&amp;gt; {&lt;br&gt;
    const reader = new FileReader();&lt;br&gt;
    reapasted&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Ran Ghostscript Inside a Browser Tab to Build a Free PDF Compressor</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Tue, 24 Mar 2026 15:27:55 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/i-ran-ghostscript-inside-a-browser-tab-to-build-a-free-pdf-compressor-2hd9</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/i-ran-ghostscript-inside-a-browser-tab-to-build-a-free-pdf-compressor-2hd9</guid>
      <description>&lt;p&gt;How WebAssembly turned a 50-year-old PDF compression engine into a privacy-first browser tool&lt;/p&gt;

&lt;p&gt;A few weeks ago a friend sent me a scanned contract. It was 18MB — too large to attach to a reply email, too sensitive to upload to a random compression website.&lt;br&gt;
The "too sensitive" part is the one most people skip past. They just upload it. Most free PDF compressors work by sending your file to their servers, running compression remotely, and sending the result back. That's a reasonable architecture. It's also a data flow that includes your document passing through infrastructure you don't control, being stored temporarily on someone else's disk, and processed by systems whose security posture you can't verify.&lt;br&gt;
For a contract, a tax return, an NDA — that matters.&lt;br&gt;
So I built the compression tool inside ihatepdf.cv differently. Everything runs in your browser tab. Your file never leaves your device. Here's exactly how.&lt;/p&gt;

&lt;p&gt;The engine: Ghostscript compiled to WebAssembly&lt;br&gt;
Ghostscript is not new software. It has been the gold standard for PostScript and PDF processing since 1988. Adobe Acrobat uses it internally. Professional print shops use it. It is written in C and has been battle-tested on hundreds of millions of documents.&lt;br&gt;
The key insight: Ghostscript can be compiled to WebAssembly. That means the same engine that runs on servers can run inside a browser tab, at near-native speed, with no server required.&lt;br&gt;
The compression pipeline uses a Web Worker so the main thread stays responsive while Ghostscript processes the PDF:&lt;br&gt;
javascriptconst worker = new Worker('/background-worker.js');&lt;/p&gt;

&lt;p&gt;worker.postMessage({&lt;br&gt;
  data: {&lt;br&gt;
    psDataURL: blobUrl,&lt;br&gt;
    config: config,&lt;br&gt;
  },&lt;br&gt;
  target: 'wasm',&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;worker.onmessage = async (e) =&amp;gt; {&lt;br&gt;
  const response = await fetch(e.data);&lt;br&gt;
  const compressedBlob = await response.blob();&lt;br&gt;
  // download the result&lt;br&gt;
};&lt;br&gt;
The user uploads a file, it becomes a Blob URL, the Blob URL is handed to the worker, Ghostscript processes it inside WebAssembly, and the result comes back as another Blob. Nothing goes over the network. Nothing is stored on any server. The whole thing happens inside the browser's sandboxed environment.&lt;/p&gt;

&lt;p&gt;The five optimizations Ghostscript applies simultaneously&lt;br&gt;
This is what separates proper PDF compression from naive approaches that just re-export the file.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Image downsampling
Photos and raster graphics embedded in PDFs are often stored at their original resolution — 600 DPI from a scanner, 300 DPI from a camera. For screen viewing, 150 DPI is visually identical. Ghostscript resamples images using bicubic interpolation, which is the highest-quality downsampling algorithm available. The result is smaller with no perceptible visual difference at normal viewing sizes.&lt;/li&gt;
&lt;li&gt;JPEG recompression
After downsampling, images are re-encoded at a quality level matched to the compression preset:
javascriptconst qualityToJpegQuality = {
'/screen':   40,   // 72 DPI  — maximum compression
'/ebook':    60,   // 150 DPI — balanced
'/printer':  80,   // 300 DPI — print quality
'/prepress': 92,   // 300 DPI — professional print
};&lt;/li&gt;
&lt;li&gt;Font subsetting
This one surprises people. Embedded fonts in PDFs often contain the entire font family — every character, every glyph, including ones your document never uses. A single embedded font can be 200–400KB. Font subsetting trims the embedded data to only characters actually present in your document. For documents using common fonts, this alone can reduce file size by 20–30%.&lt;/li&gt;
&lt;li&gt;Metadata stripping
Every PDF created by Word, Acrobat, Google Docs, or any other tool embeds metadata: author name, creation software, revision history, thumbnail previews. This data adds size and exposes information you probably don't need to include when sharing. Ghostscript strips it during compression.&lt;/li&gt;
&lt;li&gt;Stream recompression
The content streams that encode page content are recompressed using the most efficient lossless algorithm. Text and vector graphics are entirely unaffected — they are never converted to pixels, which is why text stays perfectly sharp at every compression level.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why text never gets blurry&lt;br&gt;
This is the most common concern and the most important thing to understand about PDF compression.&lt;br&gt;
Text in a PDF is not stored as an image. It is stored as vector instructions: draw this glyph, at these coordinates, with this font, at this size. Vector data has no resolution. It is mathematically perfect at every zoom level.&lt;br&gt;
Compression only affects raster content — photos, scanned pages, embedded images. A document that is entirely text and vector graphics will compress almost nothing using image downsampling, because there are no raster images to downsample. But it will still benefit from font subsetting and stream recompression.&lt;/p&gt;

&lt;p&gt;The three presets and when to use each&lt;br&gt;
Rather than exposing Ghostscript's raw configuration options, the tool maps to three practical use cases:&lt;br&gt;
Light (20–30% reduction): Uses /printer quality — 300 DPI images, 80% JPEG quality. Output is indistinguishable from the original. Use this for design portfolios, documents you will print, or any case where you want the smallest reduction in quality.&lt;br&gt;
Medium (40–50% reduction): Uses /ebook quality — 150 DPI images, 60% JPEG quality. This is the sweet spot for CVs, contracts, reports, and email attachments. Looks identical on screen. Most people cannot tell the difference between a Medium-compressed document and the original when reading normally.&lt;br&gt;
Heavy (60–70% reduction): Uses /screen quality — 72 DPI images, 40% JPEG quality. Maximum compression. Text stays perfectly sharp; photos become noticeably softer at high zoom. Use this for archiving, upload portals with strict size limits, or anywhere you need to get under 1–2MB.&lt;/p&gt;

&lt;p&gt;The memory management problem&lt;br&gt;
Processing large PDFs in a browser is harder than it sounds. A 50MB PDF with high-resolution images can consume 200–300MB of RAM during processing — 3–5× overhead is normal. On mobile devices with 2–4GB total RAM, this matters.&lt;br&gt;
The tool estimates memory requirements before starting:&lt;br&gt;
javascriptconst getDeviceCapabilities = () =&amp;gt; {&lt;br&gt;
  const isMobile = /Android|iPhone/i.test(navigator.userAgent);&lt;br&gt;
  const deviceMem = navigator.deviceMemory || 4;&lt;/p&gt;

&lt;p&gt;if (isMobile &amp;amp;&amp;amp; screen.width &amp;lt; 768) {&lt;br&gt;
    return { maxFileSize: 50 * 1024 * 1024 };  // 50MB on phones&lt;br&gt;
  }&lt;br&gt;
  if (deviceMem &amp;lt; 4) {&lt;br&gt;
    return { maxFileSize: 100 * 1024 * 1024 }; // 100MB on low memory&lt;br&gt;
  }&lt;br&gt;
  return { maxFileSize: 150 * 1024 * 1024 };   // 150MB on desktop&lt;br&gt;
};&lt;br&gt;
navigator.deviceMemory is not available in Safari, so the fallback assumes 4GB — conservative enough to handle most cases without crashing.&lt;/p&gt;

&lt;p&gt;The privacy architecture — verifiable, not just claimed&lt;br&gt;
Most privacy claims are policies. "We delete files after 2 hours." "We don't share data." These are statements you have to trust.&lt;br&gt;
The architecture here makes the claim verifiable. Open DevTools → Network tab → compress a PDF. Watch the network requests. You will see the Ghostscript WASM file load once and get cached. You will see zero upload requests for your document. No bytes of your PDF travel over the network.&lt;br&gt;
After first load, the tool also works fully offline. The service worker caches the WebAssembly libraries:&lt;br&gt;
javascriptself.addEventListener('install', (e) =&amp;gt; {&lt;br&gt;
  e.waitUntil(&lt;br&gt;
    caches.open('ihatepdf-v1').then((cache) =&amp;gt; cache.addAll([&lt;br&gt;
      '/',&lt;br&gt;
      '&lt;a href="https://unpkg.com/pdf-lib@1.17.1/dist/pdf-lib.min.js" rel="noopener noreferrer"&gt;https://unpkg.com/pdf-lib@1.17.1/dist/pdf-lib.min.js&lt;/a&gt;',&lt;br&gt;
      // ghostscript wasm + other libraries&lt;br&gt;
    ]))&lt;br&gt;
  );&lt;br&gt;
});&lt;br&gt;
Disconnect from WiFi. Reload the page. Compress a PDF. It works. There is no server to connect to because there is no server involved.&lt;/p&gt;

&lt;p&gt;The honest trade-off&lt;br&gt;
A cloud server with a dedicated CPU will compress a 150MB PDF faster than Ghostscript-WASM running in a browser tab on a four-year-old laptop.&lt;br&gt;
If you are compressing large volumes of very large files and speed is the priority, a server-based tool is genuinely better for that use case.&lt;br&gt;
For everything else — privacy, offline use, no account, no size limits beyond your device's RAM, no watermarks on the output — the local approach wins. The Ghostscript engine is identical either way. The compression quality is the same. The only difference is where the processing happens.&lt;/p&gt;

&lt;p&gt;Try it&lt;br&gt;
ihatepdf.cv/compress-pdf&lt;br&gt;
No account. No upload. No watermark. The source of the compression quality is the same Ghostscript engine that professional tools use — it just runs on your device instead of theirs.&lt;br&gt;
If you process sensitive documents and have questions about how the architecture works, or if something is broken, I read comments.&lt;/p&gt;

&lt;p&gt;This is part of an ongoing series on building a privacy-first PDF toolkit entirely in the browser using WebAssembly. The technical deep-dive on the full architecture is at ihatepdf.cv/technical-blog&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>privacy</category>
      <category>showdev</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Stopped Emailing Myself Photos — Here’s the Tool That Fixed Everything</title>
      <dc:creator>Pranav Mailarpawar</dc:creator>
      <pubDate>Mon, 23 Mar 2026 20:08:03 +0000</pubDate>
      <link>https://forem.com/pranav_mailarpawar_7039f2/i-stopped-emailing-myself-photos-heres-the-tool-that-fixed-everything-4b4i</link>
      <guid>https://forem.com/pranav_mailarpawar_7039f2/i-stopped-emailing-myself-photos-heres-the-tool-that-fixed-everything-4b4i</guid>
      <description>&lt;p&gt;How a free, privacy-first browser tool quietly became my go-to for converting images to PDF&lt;/p&gt;

&lt;p&gt;There’s a ritual most of us have performed at least once: taking a dozen photos of something — receipts, a whiteboard, handwritten notes, a product — and then spending the next ten minutes trying to figure out how to send them as one clean file.&lt;/p&gt;

&lt;p&gt;You zip them. The recipient can’t open the zip. You try Google Drive. They don’t have a Google account. You send them one by one. They get confused about the order.&lt;/p&gt;

&lt;p&gt;Eventually, someone says: “Can you just send it as a PDF?”&lt;/p&gt;

&lt;p&gt;Yes. Obviously. But converting images to PDF has, somehow, always been more painful than it should be. Until I found ihatepdf.cv.&lt;/p&gt;

&lt;p&gt;What Is ihatepdf.cv?&lt;br&gt;
ihatepdf.cv is a free, browser-based PDF toolkit — and the name tells you everything about the spirit of the product. It was clearly built by someone who has dealt with one too many bloated, subscription-gated, ad-riddled PDF tools and decided to do something about it.&lt;/p&gt;

&lt;p&gt;The platform offers a suite of PDF utilities, but the one that’s earned a permanent tab in my browser is the JPG to PDF converter — or more precisely, the Images to PDF tool at ihatepdf.cv/images-to-pdf.&lt;/p&gt;

&lt;p&gt;The Part That Actually Surprised Me: Your Files Never Leave Your Device&lt;br&gt;
Most online file converters have a quiet business model: you upload your files, they process them on their servers, and somewhere in the fine print it says they might retain your data for “service improvement.”&lt;/p&gt;

&lt;p&gt;ihatepdf.cv takes a fundamentally different stance — and they’re upfront about it.&lt;/p&gt;

&lt;p&gt;Everything happens in the browser. Your files, your control.&lt;/p&gt;

&lt;p&gt;The image-to-PDF conversion runs entirely inside your browser tab using PDF-lib, a JavaScript library that executes locally on your machine. Nothing is uploaded. Nothing is transmitted. No file ever touches an external server. What looks like a web service is actually a local application that just happens to live in your browser.&lt;/p&gt;

&lt;p&gt;This isn’t a small footnote in a privacy policy — it’s the core design decision behind the entire platform. ihatepdf.cv was built on the principle that your files belong to you, and respecting your privacy means making it technically impossible for your documents to be stored, analyzed, or shared without your knowledge.&lt;/p&gt;

&lt;p&gt;When you convert an image on ihatepdf.cv, here’s what doesn’t happen:&lt;/p&gt;

&lt;p&gt;Your file is not uploaded to any server&lt;br&gt;
No third party ever receives or processes your document&lt;br&gt;
Nothing is retained after you close the tab&lt;br&gt;
There are no accounts, no session tokens tied to your files, no tracking of what you converted&lt;br&gt;
The result is a tool you can use with complete confidence — whether you’re converting a personal photo, a financial document, a medical record, or sensitive client work. The conversion happens in your browser, your files stay on your device, and that’s where it ends.&lt;/p&gt;

&lt;p&gt;This is what genuine privacy looks like in a web tool. Not a promise. An architectural guarantee.&lt;/p&gt;

&lt;p&gt;Using It Is Embarrassingly Simple&lt;br&gt;
Here’s what the workflow actually looks like:&lt;/p&gt;

&lt;p&gt;Step 1: Go to ihatepdf.cv/images-to-pdf and click Choose Files — or just drag and drop images directly into the upload zone.&lt;/p&gt;

&lt;p&gt;Step 2: Your images appear as numbered thumbnails. You can remove any you don’t want. The numbers show you the order they’ll appear in the final PDF.&lt;/p&gt;

&lt;p&gt;Step 3: Click Convert to PDF. A spinner appears for a second or two.&lt;/p&gt;

&lt;p&gt;Step 4: Your PDF downloads automatically.&lt;/p&gt;

&lt;p&gt;Write on Medium&lt;br&gt;
That’s it. No account creation. No email verification. No watermarks. No “upgrade to Pro to download.” The whole thing takes less time than finding the right app on your phone.&lt;/p&gt;

&lt;p&gt;What It Handles Well&lt;br&gt;
JPG and PNG support — the two formats that cover about 95% of real-world image use cases. Whether you’re dealing with camera photos (almost always JPG) or screenshots and graphics (usually PNG), ihatepdf.cv handles both natively.&lt;/p&gt;

&lt;p&gt;Batch conversion — you can upload as many images as you want in a single session. They all get stitched into one PDF in the order you uploaded them.&lt;/p&gt;

&lt;p&gt;Original quality preservation — images are embedded at their source resolution. There’s no recompression, no quality loss. What you put in is what comes out, pixel for pixel.&lt;/p&gt;

&lt;p&gt;Works offline — after the initial page load, the converter works without an internet connection. The processing is all local.&lt;/p&gt;

&lt;p&gt;Mobile-friendly — the interface is responsive and includes explicit support for iOS Safari, which is historically one of the harder browsers to get file downloads working correctly on.&lt;/p&gt;

&lt;p&gt;The Use Cases That Make This Indispensable&lt;br&gt;
Once you have a fast, frictionless image-to-PDF tool, you start reaching for it constantly. A few situations where I’ve found it genuinely useful:&lt;/p&gt;

&lt;p&gt;Expense reports. Snap photos of receipts throughout the week, then dump them all into ihatepdf.cv on Friday afternoon. One PDF, one attachment, done.&lt;/p&gt;

&lt;p&gt;Scanned documents. If you’re using your phone camera as a scanner — and most people are at this point — the output is a folder full of JPGs. ihatepdf.cv turns that folder into a proper document.&lt;/p&gt;

&lt;p&gt;Client deliverables. Designers and photographers often need to send proofs or samples. A PDF looks more intentional than a ZIP file of images and opens cleanly on every device.&lt;/p&gt;

&lt;p&gt;Portfolios. Compile a selection of work images into a single, shareable document without needing InDesign or Canva.&lt;/p&gt;

&lt;p&gt;Archiving paperwork. Utility bills, insurance documents, anything you’ve photographed for record-keeping purposes — PDFs are far more reliably searchable and organized than a pile of JPGs.&lt;/p&gt;

&lt;p&gt;A Few Practical Tips&lt;br&gt;
Name your files before uploading. ihatepdf.cv doesn’t currently support drag-to-reorder within the interface. If you need a specific page sequence, naming your files 001_, 002_, 003_ before uploading ensures they come in the right order.&lt;/p&gt;

&lt;p&gt;Very large images slow things down. Because everything is processed in-browser, images above 20MB can noticeably slow the conversion. A quick resize beforehand keeps things snappy.&lt;/p&gt;

&lt;p&gt;Combine with other ihatepdf.cv tools. If the resulting PDF is too large to email, the ihatepdf.cv compress tool can reduce the file size without a noticeable quality difference. If you need to add page numbers to a long document, that’s another tool on the same platform. The whole suite is designed to be used together.&lt;/p&gt;

&lt;p&gt;Why This Approach to Software Matters&lt;br&gt;
There’s a broader design philosophy at work here that I think is worth naming.&lt;/p&gt;

&lt;p&gt;Most consumer software has drifted toward extracting as much value from users as possible: subscriptions for basic features, dark patterns that obscure free tiers, data collection baked into every interaction.&lt;/p&gt;

&lt;p&gt;ihatepdf.cv is a quiet counterexample. It’s free. It’s private by design. It doesn’t require an account. It doesn’t ask for anything in return. The tool exists to solve a specific problem as efficiently as possible, and then it gets out of your way.&lt;/p&gt;

&lt;p&gt;That’s rare. It’s worth pointing at when you find it.&lt;/p&gt;

&lt;p&gt;The Bottom Line&lt;br&gt;
If you regularly deal with images that need to become documents — for work, for admin, for clients, for record-keeping — ihatepdf.cv/images-to-pdf is worth bookmarking right now.&lt;/p&gt;

&lt;p&gt;It’s fast, it’s free, it works without uploading your files anywhere, and it produces clean, full-quality PDFs every time. There’s genuinely no catch.&lt;/p&gt;

&lt;p&gt;The next time someone asks you to “send it as a PDF,” you’ll know exactly where to go.&lt;/p&gt;

&lt;p&gt;Try it for yourself at ihatepdf.cv — no account needed.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>softwaredevelopment</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
