Forem: Bob

[Fixed] How to Solve the 99% Hang in ffmpeg.wasm Apps

Bob — Fri, 08 May 2026 10:29:30 +0000

TL;DR: The hang is caused by memory overlap. Delete your input file before reading the output.

I’ve been building VideoSnap, a tool that processes video entirely in the browser using ffmpeg.wasm. For a long time, I was haunted by a specific, frustrating bug: the "99% Trap."

A user uploads a file, the progress bar climbs smoothly, hits 99%... and then everything just stops.

The UI becomes unresponsive. The fan starts spinning. It doesn't crash with an "Aw, Snap!" error, but it hangs there, sometimes for minutes. Then, suddenly, the download pops up as if nothing happened.

I realized that the 99% mark isn't where FFmpeg is working—it's where the browser is fighting for its life.

The 99% isn't FFmpeg—it's the Handover

In ffmpeg.wasm, the progress bar tracks the FFmpeg execution. When it hits 99%, the heavy lifting of transcoding is actually done.

The "hang" happens during the handover: when you call engine.readFile() to pull the processed video out of the WebAssembly virtual memory (MEMFS) and into the JavaScript heap.

The "Memory Overlap" Problem

WebAssembly (currently) has a hard 32-bit memory limit (effectively ~2GB). Imagine you are converting a 500MB video:

The Peak: At 99%, the WASM memory is holding your 500MB input file PLUS the newly generated 500MB output file. That’s 1GB of WASM memory occupied.
The Request: You call engine.readFile(). JavaScript now tries to allocate a new 500MB Uint8Array to copy that data.
The GC Storm: Your browser is now trying to manage nearly 1.5GB to 2GB of massive, contiguous memory blocks.

This triggers a "Stop-The-World" Garbage Collection (GC) event. The browser's Main Thread locks up completely. It is desperately trying to defragment memory to find a 500MB hole. This intense "GC thrashing" is why the UI freezes before the file finally breaks free.

The "Surgical" Fix: Breaking the Overlap

Once I understood that the stall was caused by the simultaneous existence of the input and output files in MEMFS, the fix became obvious.

I needed to clear the desk before trying to move the big box.

I implemented what I call Surgical Memory Management:

// The optimized handover logic:

// 1. FFmpeg is done. Before we even THINK about reading the output, 
// we must kill the input file to free up hundreds of MBs in WASM.
await engine.deleteFile('input.mp4'); 

// 2. Now that the WASM memory has breathing room, we read the result.
// The browser can allocate the JS buffer without a massive GC fight.
const data = await engine.readFile('output.mp4');

// 3. The millisecond we have the data in JS, we nuke the WASM output copy.
await engine.deleteFile('output.mp4');

// 4. Now WASM is empty, and we only hold the file in the JS heap.
const blob = new Blob([data.buffer], { type: 'video/mp4' });

By reordering these deletions, I eliminated the massive memory overlap at the exact moment the browser needs memory the most. The 99% hang doesn't magically vanish—it still takes time for the browser to allocate large JS buffers—but this surgical cleanup shaves off crucial seconds of GC thrashing. More portantly, it keeps the browser tab from quietly suffocating under heavy files.

Why I didn't use WORKERFS or OPFS

I explored other options, but they all had catch-22s:

WORKERFS: It mounts files without copying them, which sounds perfect. But it uses a synchronous I/O bridge that makes FFmpeg run significantly slower. I traded memory for a massive speed penalty. Not worth it.
OPFS (Origin Private File System): This is the future. It streams data directly to disk. But it requires a custom-built FFmpeg core with WASMFS support, which is a massive engineering undertaking that the official @ffmpeg/ffmpeg doesn't support out-of-the-box yet.

The Takeaway: Know Your Handovers

If you are building high-performance WebAssembly apps, remember: the most dangerous part of the pipeline is the data handover.

When you move large amounts of data between the WASM "world" and the JS "world," the browser doesn't see a file—it sees a massive, contiguous memory allocation request. If you don't clean up your internal state before you make that request, you're asking for a GC storm.

VideoSnap is now significantly more stable, not because I made the math faster, but because I managed the memory lifecycle with more precision.

I’m the builder of VideoSnap. I write about the messy reality of building high-performance tools in the browser. Follow for more deep dives.

Why Prompt-Only Moderation Failed in My AI Generation App

Bob — Fri, 10 Apr 2026 06:50:29 +0000

When I first added moderation to my AI generation app, I treated it as a text problem.

That seemed reasonable at the time. A user sends a prompt, I check the prompt, and if it looks unsafe, I block the request before it reaches the model.

That approach worked for a very short time.

It stopped working the moment I supported image inputs, reference images, and multiple generation flows. At that point, I realized something important: prompt-only moderation is not really moderation. It is just one partial check inside a much larger pipeline.

This post is about what changed in my backend once I accepted that.

The mistake: treating moderation as a wrapper

A lot of AI products start with moderation as a thin wrapper around generation:

receive a prompt
run a text safety check
call the model provider
return the result

The problem is that real generation workflows are rarely that simple.

Once users can upload source images, provide reference images, or switch between text-to-image and image-to-image generation flows, the prompt becomes just one component of the overall request. A completely harmless prompt can still be paired with problematic input images. If the backend only inspects the text, the system will inevitably have a blind spot.

That was the first issue I had to fix.

Moderation belongs inside the generation pipeline

I ended up moving moderation into the backend generation workflow itself instead of treating it as a separate utility.

Conceptually, the flow became:

validate the request
load the selected provider and model
inspect both prompt text and image inputs
block flagged requests before spending credits
create the generation task only if moderation passes

That decision helped for two reasons.

First, it kept moderation close to the actual business rules. I did not want unsafe requests to consume credits, create external jobs, or leave behind half-failed task records.

Second, it forced me to normalize the input shape. Instead of only thinking in terms of prompt, I had to define a moderation input that could include prompt text, image URLs, model context, and generation scene.

That made the system much easier to reason about.

Prompt checks are useful, but incomplete

Text moderation is still valuable. It catches a lot of obvious cases early, and it is usually cheaper and faster than processing images.

But text-only checks have two major limitations.

The first is obvious: users can submit problematic visual input even if the prompt itself looks harmless.

The second is less obvious: language coverage is uneven. Depending on the moderation provider, some languages are better supported than others. That means your confidence level should not be the same across all prompts.

In my case, that pushed me toward a more defensive design: if text checks are incomplete, the rest of the safety system has to acknowledge that limitation instead of pretending the problem is solved.

Images changed the design

The biggest improvement came from treating image inputs as first-class moderation targets.

That sounds straightforward, but it changed several implementation details:

the moderation step now had to collect image URLs from different request fields
the backend needed one normalized moderation interface, even if the underlying provider had different APIs for text and image checks
moderation results had to return structured categories and scores, not just a single boolean
failure behavior had to be explicit

That last point matters more than it seems.

If a moderation provider fails, what should happen?

You have to choose between two imperfect options:

fail-open: allow the request and accept some risk
fail-closed: block the request and accept some false positives or degraded UX

There is no universal correct answer. It depends on the kind of product you are building, your abuse tolerance, and how costly a bad generation is for you. But the important part is to make the decision deliberately. Silent fallback logic is where safety systems get weak.

Provider-specific APIs should not leak everywhere

Another lesson was that moderation providers should be isolated behind a small internal interface.

Not because provider abstraction is fashionable, but because safety logic tends to spread if you let it.

If one route handler knows how text moderation works, another knows how image moderation works, and a third knows how to interpret provider-specific category names, you do not have a moderation layer anymore. You have moderation fragments.

I found it much cleaner to keep a moderation manager in the backend and let the generation route ask one question: “Is this request safe enough to proceed?”

That does not remove complexity. It contains it.

The practical takeaway

The most useful shift in my thinking was this:

Moderation is not a feature attached to generation. It is part of generation.

Once I started treating it that way, the backend became easier to evolve. I could add checks for both prompt text and image inputs, make blocking decisions before credits were consumed, and keep provider-specific moderation details out of the rest of the app.

I am using this approach while building videoflux.video, where one workflow needs to support AI image and video generation without assuming that a prompt alone tells the full safety story.

Disclosure: I’m the builder of videoflux.video.