Forem: Vitalii Buhaiov

ChunkLoadError on every deploy: the in-place rebuild trap in Next.js standalone

Vitalii Buhaiov — Mon, 18 May 2026 08:07:04 +0000

We run a Next.js 16 site behind nginx on a single VPS. Recently Google Search Console reported a single 500 on one of our locale-prefixed pages. The page was working fine by the time I clicked through. I almost ignored it. I'm glad I didn't. The trail led to a bug that fires on every deploy, and the fix is short.

Here's the story and what the fix cost us.

The single 500

Search Console flagged a locale-prefixed product route. The URL returned a clean 200 when I curled it. So either the indexer hit a transient blip, or something in our deploy flow occasionally leaks a 500 to whichever request happens to be in flight at the wrong second.

The nginx access log made it concrete. One 500 for that URL, single timestamp, never before or after:

[06:58:05]  GET /es/products/details  500

Now the matching journalctl -u frontend for the same second:

06:58:04  Error [ChunkLoadError]: Failed to load chunk
          server/chunks/ssr/messages_es_json_[json]_cjs_xxxxxxxx._.js
          from module 83578
   [cause]: Error: Cannot find module
            '/opt/app/frontend/.next/standalone/.next/server/chunks/ssr/...'
06:58:04  Error [ChunkLoadError]: Failed to load chunk
          server/chunks/ssr/[root-of-the-server]__xxxxxxx._.js ...
06:58:04  ⨯ unhandledRejection: ChunkLoadError ...

Hundreds of these in a five-second window, then silence. That five-second window matched the deploy run from earlier that morning to the second. A later deploy left a bigger spread of 500s across other locale-prefixed routes. Same root cause, same five seconds, more URLs simply because more requests landed in the window.

What the rebuild was doing

Our deploy on master push was:

cd /opt/app/frontend
npm ci --prefer-offline
npm run build              # writes .next/standalone/ + .next/static/ + .next/server/
cp -r .next/static .next/standalone/.next/static
cp -r public        .next/standalone/public
systemctl restart frontend

The WorkingDirectory of the systemd unit was .next/standalone/. next build overwrites that directory in place. So during a 3-minute rebuild, the running Node process held a CPU full of in-memory references to chunk filenames (say, server/chunks/ssr/messages_es_json_[json]_cjs_xxxxxxxx._.js) that the new build had just deleted and replaced with a different hash. Then systemctl restart finally killed the old process and started a new one.

Any SSR request that hit the old process during that ~5-second window between "files replaced" and "process restarted" tried to lazy-load a chunk by its old filename. Node went to disk, didn't find it, threw ChunkLoadError. Next.js doesn't handle that in the SSR path. It bubbles up as a 500.

In-memory code that pre-loaded its chunks at boot kept working. Anything that touched a route that lazy-loaded (a different locale, an MDX-rendered page, a dynamic import) was a coin flip.

This isn't a Next.js bug. It's the cost of in-place rebuild deploys for any Node.js process that uses dynamic imports. We had lots of them: one per locale message bundle, one per MDX route, one per locale-prefixed page.

What we considered

Four options, in increasing order of "actually adequate":

Stop, then build, then start. systemctl stop before npm run build. The running process never sees mismatched chunks because it isn't running. Cost: nginx returns 502 for 30–60 seconds while the build runs. 502 is "service unavailable, retry later", which Google treats as transient. Much friendlier than 500. Users still see a maintenance-ish page for a minute.
Atomic directory swap. Build into a sibling directory, then mv .next/standalone .next/standalone-old && mv .next/standalone-new .next/standalone && systemctl restart. The running process keeps reading its old (now-renamed) directory until restart. Window shrinks from 30 seconds of 502 to 3–5 seconds of 502. Still some downtime, no 500s.
proxy_next_upstream with a backup server. Tell nginx to retry on a backup if the primary returns 500. Requires keeping two upstream instances in sync forever, including during deploys. That sync is exactly the problem we were trying to solve, so this just relocates it.
Blue-green at the systemd + nginx layer. Two long-running pools on different ports. Build into the idle one. Health-check it. Atomically swap nginx upstream. Drain. Stop the old. Zero failed requests during deploy.

We chose 4. The first three each shave a different chunk off the failure window; 4 closes it entirely. And it costs almost nothing on a 16 GB box (more on this below).

The pieces

Two systemd instances from one template unit

# /etc/systemd/system/frontend@.service
[Unit]
Description=Frontend (Next.js standalone, %i pool)
After=network.target
ConditionPathExists=/opt/app/frontend/pools/%i/server.js

[Service]
Type=simple
User=app
WorkingDirectory=/opt/app/frontend/pools/%i
EnvironmentFile=/etc/frontend-%i.env
Environment=NODE_ENV=production
Environment=HOSTNAME=127.0.0.1
ExecStart=/usr/bin/node server.js
Restart=always
RestartSec=5

%i is the instance name. frontend@blue runs from pools/blue/, frontend@green from pools/green/. The per-color env files supply PORT=3000 and PORT=3001 respectively, kept VPS-local because they don't belong in git.

ConditionPathExists is doing real work. Without it, an empty pool slot (fresh install, partial deploy) would loop on Restart=always. With it, systemd just doesn't start the unit until the path appears.

Nginx upstream as an include file

# /etc/nginx/conf.d/frontend-upstream.conf
upstream frontend {
    include /etc/nginx/frontend-upstream-active.inc;
}

# /etc/nginx/frontend-upstream-active.inc
server 127.0.0.1:3001;

The deploy script never edits frontend-upstream.conf. It writes a new frontend-upstream-active.inc via temp-file + mv (which is atomic on a single filesystem), then sends nginx -s reload. mv(2) flips the upstream pointer in one instruction; reload graceful-rotates the workers.

One trap: name the include file with an extension that isn't .conf, or put it outside /etc/nginx/conf.d/. Otherwise the top-level include /etc/nginx/conf.d/*.conf will try to load it as a standalone config and choke on the bare server directive. We used .inc.

Deploy script flow

ACTIVE=$(cat pools/active-color 2>/dev/null || echo blue)
if [ "$ACTIVE" = "blue" ]; then
  IDLE=green; IDLE_PORT=3001; ACTIVE_PORT=3000
else
  IDLE=blue;  IDLE_PORT=3000; ACTIVE_PORT=3001
fi

# Sanity check: abort if the world is inconsistent.
NGINX_PORT=$(grep -oE '127\.0\.0\.1:[0-9]+' "$NGINX_UPSTREAM" | cut -d: -f2)
[ "$NGINX_PORT" = "$ACTIVE_PORT" ] || { echo "FAIL: marker mismatch"; exit 1; }

# Build (only writes inside .next/, not pools/).
rm -rf "pools/$IDLE"
npm run build

# Stage build into idle pool.
mv .next/standalone "pools/$IDLE"

# Bring idle online and prove it works.
systemctl restart "frontend@$IDLE"
for i in $(seq 1 60); do
  curl -sf -o /dev/null --max-time 2 \
    -H 'Host: example.com' \
    "http://127.0.0.1:$IDLE_PORT/" && break
  sleep 1
done

# Multi-route smoke (locale-prefixed + MDX + dynamic) before cutover.
for route in / /es/ /products/details /docs/guide /blog; do
  curl -s -L -o /dev/null --max-time 5 -w '%{http_code}' \
    -H 'Host: example.com' "http://127.0.0.1:$IDLE_PORT$route" | grep -qE '^[23]'
done

# Atomic upstream swap + reload.
printf 'server 127.0.0.1:%s;\n' "$IDLE_PORT" > "${NGINX_UPSTREAM}.new"
mv "${NGINX_UPSTREAM}.new" "$NGINX_UPSTREAM"
nginx -t && nginx -s reload

# Mark, drain, retire.
echo "$IDLE" > pools/active-color
systemctl enable  "frontend@$IDLE"
systemctl disable "frontend@$ACTIVE"
sleep 30                              # drain in-flight requests on the old pool
systemctl stop "frontend@$ACTIVE"

The order matters more than it looks. Two specifics.

Write the marker file immediately after the nginx reload, before the drain. If the script crashes during the sleep or the systemctl stop, the marker reflects what nginx is doing right now. The next deploy reads truth, not stale state.

Sanity-check before destructive ops. rm -rf pools/$IDLE is fine if $IDLE really is idle. If the marker file lies (say a previous rollback was incomplete), $IDLE could be the pool that's serving traffic. The pre-flight check compares the marker against nginx's upstream port and refuses to proceed on a mismatch.

What it costs us

Measured on the live VPS. Your absolute numbers will vary with bundle size and traffic; the ratios won't:

	Before	After (steady)	After (during 30-s cutover)
Frontend RAM (RSS)	241 MB	241 MB	482 MB (both pools running)
Disk used by pool dirs	173 MB	346 MB (active + previous, kept for rollback)	346 MB
Frontend CPU	~0 % idle	~0 %	~0 % (both pools idle during cutover)
Build-phase RAM peak	~1.0–1.5 GB	unchanged	unchanged

Against a 16 GB / 150 GB box where Redis already eats 4–5 GB resident, this is rounding error. The build itself is the expensive part of any deploy and it didn't change.

What it bought us:

Zero 500s during deploy. The old pool keeps serving its own (unchanged) chunks until it's gracefully stopped. The new pool starts from a complete on-disk build before nginx ever sends it a request.
Zero 502s during deploy. No restart window. nginx -s reload is graceful and doesn't drop in-flight connections.
Cheap rollback. The previous pool's directory is retained until the next deploy. To revert, the rollback script starts the old pool, writes the include file back, reloads nginx. No rebuild needed. About 10 seconds end-to-end.
Honest failure mode. If the build fails, the script aborts before touching nginx; the old pool is still serving. If the new pool fails health-check, the script stops it and exits non-zero; the old pool is still serving. There's no state in which the deploy can take the site down mid-flight.

Gotchas

`next build` cleans `.next/` at start

The first cut of this had pool directories at .next/standalone-blue/ and .next/standalone-green/. They got wiped on every rebuild. next build does a recursive clean of .next/ before running. If you want anything to survive across builds, keep it outside .next/. We moved pools to pools/<color>/ (sibling of .next/).

Not Next.js-specific. Most build tools assume their output dir is theirs to own. Don't squat in it.

`mv` is safe under a running Linux process

While migrating prod to the new layout I had to move pools/blue/ while frontend@blue was actively serving from inside it. Linux inode semantics make this fine: a process holds inode references through its open file descriptors and CWD, not path strings. mv within a single filesystem is just a rename(2); the inodes don't move. The running pool kept serving without noticing.

Same reason tail -f keeps working when you rotate a log file by renaming it. Useful primitive once you remember it.

Don't put backup files in `sites-enabled/`

I made a backup of /etc/nginx/sites-enabled/default next to the original, then nginx -t started warning about "conflicting server name" entries. The top-level include /etc/nginx/sites-enabled/* was loading my .bak as a config. Move backups elsewhere or rename them so the glob misses.

systemd templates aren't auto-enabled by `enable --now`

Our CI workflow has a generic loop that auto-enables newly-installed singleton units. Templates (foo@.service) are explicitly skipped because they need an instance name. That's the right behavior for our case: we want exactly one of blue/green enabled at a time, and the deploy script decides which.

Health-check should match production conditions

A bare curl http://127.0.0.1:$PORT/ will succeed in a lot of cases where production is broken. Add -H 'Host: example.com' if you're behind a reverse proxy, follow redirects with -L, and probe routes that exercise the middleware / SSR / MDX paths that you care about. We had a Next.js + Cloudflare + nginx interaction bug that only surfaced when the request Host header didn't match 127.0.0.1. A localhost-only health check wouldn't have caught it.

When this isn't worth it

This pattern is small enough to recommend for any single-VPS deploy that uses systemd + a reverse proxy. It scales to multiple boxes the same way. Replace "two systemd instances on one box" with "two server fleets behind a load balancer" and the swap mechanic is identical.

It is not worth it if:

Your app boots in under a second and has no in-flight state that matters across restarts. A plain restart is simpler and the failure window is too short to care.
You already use a real orchestrator. Kubernetes, Nomad, ECS: all of them do this for you as a rolling deploy. If you have it, use it.
You're on a serverless platform where the runtime owns the deploy lifecycle. Same reason.

For a single-VPS Node.js process behind nginx, though, blue-green is the proportionate fix. Half a day of work, no new dependencies.

What changed for us, concretely

The five-second ChunkLoadError window is gone by construction. The old pool's chunks never get touched; the new pool starts from a complete build before nginx ever sends it a request.
Rollback is a 10-second nginx-upstream rewrite, not a rebuild.
The next time next build evicts a critical chunk filename (and it will), nobody outside our journal will know.

The hour we spent figuring out the original bug was longer than the hour we spent implementing the fix. If you're running anything stateful behind nginx and your deploy is git pull && build && restart, look at what your single-500 window looks like.

Why a single timestamp breaks real-time aggregation

Vitalii Buhaiov — Thu, 14 May 2026 21:14:13 +0000

During volatile moves, the aggregator could show a "consensus" order book that never existed on any exchange at any single instant. The bug: one timestamp field hiding three different "nows", one per venue.

I learned this aggregating live order books. The pattern generalizes to any multi-source pipeline.

The setup

I run a service that joins live order books from Binance, Bybit, and OKX into one view. Each producer is a small daemon (one per exchange/asset pair) that holds a websocket open, applies snapshot+diff updates, and publishes the top 200 levels to Redis. A 10 Hz aggregator reads the three Redis keys, bins prices into per-asset buckets ($1 for BTC, $0.10 for ETH, etc.), and writes one unified snapshot.

There's an obvious question every consumer asks:

what time is this unified snapshot from?

If you're paying attention, this question has two wrong answers before it has a right one.

Wrong answer #1: now()

The aggregator runs every 100 ms. So the snapshot is from… now? Sort of. It's from the moment the aggregator built it. The underlying data is older. Each producer has its own publish cadence, the websocket has its own jitter, and the exchange itself stamped the event some milliseconds before that.

now() is fine as an audit-log field ("the aggregator emitted this snapshot at t=…"). It's wrong if a consumer wants to know how old the data is.

Wrong answer #2: max(producer.ts)

Better: each producer stamps ts = int(time.time() * 1000) on its publish. The aggregator picks the freshest producer and calls that the snapshot's time.

This is what I shipped first. It's wrong for a subtler reason: producer ts is wall time on the producer machine, not the exchange's event time. Two producers can be sitting on data that the exchange stamped 200 ms apart yet publish to Redis 5 ms apart on their wall clocks. The snapshot looks synchronized because producer clocks are close, even though the underlying exchange events are not. Even perfectly synchronized producer clocks can't reconstruct exchange event ordering after the fact.

In quiet markets this is invisible. In high-volatility moments (a Fed print, a liquidation cascade) Binance, Bybit, and OKX can stamp their depth events tens to hundreds of milliseconds apart. Producer ts hides this completely. The consensus snapshot reads as a single instant when in fact it's stitched from three.

The two-field rule

The fix that finally stuck: every payload carries two timestamps.

payload = {
    "ts":        int(time.time() * 1000),  # producer wall time (publish moment)
    "event_ts":  top.get("event_ts"),      # exchange-stamped event time
    ...
}

exchange emits event  →  producer receives  →  producer publishes  →  aggregator joins
       event_ts                                       ts

ts is the producer's wall clock at publish. It controls Redis TTL ("is this producer alive?") and is the right field for staleness gates. If a producer dies, ts stops moving, which is the signal to exclude it.

event_ts is whatever the exchange called the time of the underlying event. It's the right field for cross-source alignment. Two producers with event_ts 200 ms apart are showing different instants of the market, even if their ts is identical.

These do different jobs. Conflating them, picking one field to do both, leaks one job into the other. Gate staleness on event_ts, and you get false alarms when a venue throttles its push rate. Align on ts, and you get fake consensus during a vol spike.

Two fields. Two jobs. Don't combine them.

Treasure hunt: what each venue actually sends

The two-field rule pushes complexity into the producers. Each one has to know what its exchange calls "event time". The answer is different at every venue.

Binance futures depth stream: each event carries E (exchange-emitted event timestamp) and T (transaction time). I prefer E, fall back to T:

ev_ts = ev.get("E") or ev.get("T")

Both are ms epoch ints.

Bybit orderbook.50.<SYMBOL> on v5 linear: a top-level ts field on the wrapping frame, int ms epoch. The inner data carries u (sequence id) but no separate event timestamp.

OKX books channel on v5: the timestamp lives inside each data[] entry, and it arrives as a string of ms epoch.

Three venues, three shapes:

Venue	Field	Location	Wire type
Binance	`E` (fallback `T`)	per-event object	int ms
Bybit	`ts`	top-level wrapper	int ms
OKX	`ts`	per-entry inside `data[]`	string ms (parse)

After extraction, every producer normalizes to int ms epoch stamped on a field literally named event_ts in Redis. Downstream code never branches on exchange to interpret time.

Normalize exchange-specific timestamp semantics at the producer boundary, not in downstream consumers.

The aggregator: two jobs, two fields

With the invariant in place, the aggregator's logic separates cleanly. Stale gate uses ts. Alignment uses event_ts. They never cross paths.

The stale gate:

snap_ts = snap.get("ts", 0)
age_ms = now_ms - snap_ts if snap_ts else None
if age_ms is None or age_ms > STALE_THRESHOLD_S * 1000:
    sources_status.append({
        "exchange": ex, "status": "stale", "age_ms": age_ms,
    })
    continue

Threshold is 60 s. If a producer's wall clock hasn't moved in a minute, it's dead. Exclude it from the union. This is the right measure of producer health and nothing else.

For the live sources, extract event_ts and the lag it implies:

ev_ts_raw = snap.get("event_ts")
event_ts = int(ev_ts_raw) if ev_ts_raw is not None else None
event_age_ms = (
    now_ms - event_ts if event_ts is not None else None
)

event_age_ms is the honest measure of how far behind this venue's underlying data is. A producer can be perfectly healthy (recent ts) yet showing data that's 300 ms behind because the exchange itself is slow under load. That's a different failure mode, and the FE needs to surface it differently. Not "Bybit is down" but "Bybit is lagging."

cross_exchange_skew_ms as honesty metric

Once each live venue has an event_ts, the skew across them falls out:

ok_event_ts = [
    ts for ex, ts in event_ts_per_exch.items()
    if ex in ok_exchanges and ts is not None
]
cross_exchange_skew_ms = None
if len(ok_event_ts) >= 2:
    cross_exchange_skew_ms = max(ok_event_ts) - min(ok_event_ts)

A single integer. Spread between earliest and latest exchange-stamped event in the "consensus" snapshot.

Binance event_ts = 12:00:00.100
Bybit   event_ts = 12:00:00.420
OKX     event_ts = 12:00:00.160

cross_exchange_skew_ms = 320

That "consensus" snapshot spans almost a third of a second. Skew is a different signal from transport latency. It captures how far apart the venues' own clocks place their events, regardless of how fast your producers ran. If the exchanges themselves stamped events 320 ms apart, the snapshot is 320 ms wide.

Practical thresholds:

<100 ms     = normal
100–300 ms  = degraded
>300 ms     = unsafe for microstructure signals

This metric does one important thing: it surfaces dishonesty in the consensus view. A naive aggregator presents a unified snapshot as if it's a single instant. It isn't. By emitting cross_exchange_skew_ms on every payload, every consumer picks its own policy. A 1-second chart can ignore 200 ms of skew. A spoof-detection feature has to discard the snapshot and wait. A "live consensus" UI can display the skew as a number. "Consensus over a 47 ms window" is honest; hiding it isn't.

The principle generalizes: when a system can't be honest about precision, it should at least be honest about its imprecision.

When a venue's clock drifts

The two-field rule also protects against a failure mode that took me embarrassingly long to notice: an exchange's clock can be wrong.

Most of the time venue clocks are NTP-disciplined and accurate to a few ms. But under load, after a maintenance window, or around an NTP step, a venue's event_ts can lurch forward (or backward) by hundreds of ms relative to wall time. The producer faithfully forwards the bad timestamp because that's its job.

With one timestamp, you can't tell whether the producer is slow or the venue is, so you either drop the venue or accept a torn consensus snapshot. With two timestamps the failure is visible: event_age_ms goes negative (the venue claims data from the future), or it spikes asymmetrically vs other venues. The skew metric lights up, and you can downgrade that venue specifically, not the whole pipeline.

Beyond crypto: any multi-source pipeline

The pattern isn't crypto-specific. Anywhere you aggregate real-time data from N independent sources, the same problem shows up:

Distributed logs: pod wall-clock time vs the event's actual occurrence (or the trace span's start).
Sensor fusion: each sensor's local clock vs the moment your gateway received it.
IoT telemetry: device clock (often horribly skewed) vs gateway ingestion time.
Cross-region replication: source DB commit time vs replica apply time.

Same three temptations. Same three wrong answers. Same fix: carry both timestamps to the joining layer. Ingestion time for liveness and TTL. Source time for alignment. Spread between sources as a first-class metric so consumers decide what to trust.

This is also the framing Apache Flink and Beam ship with: event time vs processing time, with watermarks to surface drift. Most ad-hoc real-time systems converge to the same dual-timestamp model eventually. You can skip the eventually.

What I'd tell past me

Two lines.

In the very first payload schema, write both ts and event_ts. Migrating later means rewriting every producer, the aggregator, and every consumer. Adding it on day one is two extra lines per producer.
Emit the skew metric on every aggregated payload, even when it's "always low". The day skew matters, you'll wish you had it on the wire.

The single-timestamp field is one of those defaults that looks fine until it doesn't.

One timestamp tells you when you saw the data.

Two timestamps tell you when the market happened.

Carry both.