Forem: Michael Stelly

WebLLM Works in Dev But Fails on Vercel: The CSP Directive You're Missing

Michael Stelly — Wed, 06 May 2026 18:14:00 +0000

WebGPU was detected, the consent dialog appeared, the user granted consent, and the network tab showed a clean 200 OK on the WASM fetch. Then nothing. No console output, no progress events. The UI showed "Enhanced analysis unavailable. Try again." In npm run dev, against the same code on the same browser, everything worked.

I almost spent an afternoon on the wrong theory. The reason I didn't is the reason for this post.

When you're debugging, anomalies are easier to chase than absences. The 200 OK had a body of 0.3 KB, which is anomalous for a binary asset that should be a few megabytes. The console was silent, which is an absence. I led with the anomaly. That was the mistake.

The Wrong Hypothesis

The 0.3 KB number had a tidy explanation. A Git LFS pointer file is roughly 130 bytes plus HTTP overhead. raw.githubusercontent.com is well known for not serving LFS content from some repos: instead of redirecting to media.githubusercontent.com (which actually serves the binary), it returns the pointer text. If you fetch what you think is a 50 MB asset and get back 130 bytes of YAML-looking text, that's why.

The story almost wrote itself. WebLLM downloads the WASM. The WASM is actually an LFS pointer. The worker tries to call WebAssembly.instantiateStreaming() on the pointer text, fails immediately, and the fallback chain swallows the error. Plausible, mechanical, and it explained both the response size and the silent failure.

It was also wrong.

I pulled the .gitattributes from mlc-ai/binary-mlc-llm-libs/main. The entire file:

app-release.apk filter=lfs diff=lfs merge=lfs -text
mlc-chat.apk filter=lfs diff=lfs merge=lfs -text

Two lines. Only .apk files are LFS-tracked. WASM files are stored as regular Git blobs, which means raw.githubusercontent.com serves them directly with no pointer indirection. The theory was elegant and irrelevant.

This took thirty seconds to disprove. I'd been ready to spend hours on it.

The 0.3 KB number itself is still unexplained. Most likely a transient, a misread, or a path that 404'd with a small body served as 200 by an intermediary. Once the actual bug was fixed, the file fetched at the expected size and never showed up small again.

The lesson was sitting in plain sight the whole time. The console was silent. Silent. Not silent-after-an-error, not silent-with-a-warning, but silent across an entire failure path that was supposed to log progress events, then a model load completion, then inference timing. None of those fired. The anomaly told me where the bytes went. The absence told me the worker never got far enough to log anything.

If I'd led with the absence, I would have been looking for "what stops a worker from logging anything at all" within minutes. Instead I spent an hour on a theory about a domain I happened to find interesting.

The Actual Bug

The CSP deployed to Vercel, abridged:

default-src 'self';
script-src 'self' https://analytics.example.com;
worker-src 'self' blob:;
connect-src https://analytics.example.com
            https://huggingface.co
            https://*.huggingface.co
            https://raw.githubusercontent.com;

Notice what's missing from script-src. From MDN:

By default, if a CSP contains a default-src or a script-src directive, then a page won't be allowed to compile WebAssembly using functions like WebAssembly.compileStreaming(). The wasm-unsafe-eval keyword can be used to undo this protection.

WebLLM's worker calls WebAssembly.instantiateStreaming() on the model_lib WASM as part of engine initialization. With no 'wasm-unsafe-eval' in script-src, the browser refuses to compile and throws a CompileError inside the worker. The worker's onerror is caught by the CoachingProvider fallback chain, which is designed to fail gracefully: WebLLM fails, try Ollama, then fall back to rule-based. The architecture was handling errors correctly. The errors should not have been swallowed in the first place.

There were no diagnostic logs because every one of them was gated on import.meta.env.DEV and stripped from the production build. I'd treated error-path logging as production noise, which is the same as not having it.

The fix is one directive:

script-src 'self' 'wasm-unsafe-eval' https://analytics.example.com;

The directive is narrower than 'unsafe-eval' and is the safe choice for this case. The fix worked.

Why This Pattern is Treacherous

Three things composed to hide the failure.

The dev/prod CSP asymmetry is structural. Vite's dev server doesn't apply vercel.json headers. Any CSP-gated capability is unrestricted locally and enforced remotely. The same code path produces different runtime behavior across environments. "Works on my machine" is the literal truth.

Worker errors don't surface like main-thread errors. They propagate only through the worker's onerror event. If something catches that event for graceful degradation, the error is invisible unless the catch block logs it.

Production console hygiene fights you. Stripping logs is a defensible default, but error-path logs are not noise. They're the only signal that lets you debug failures users are actually hitting. When a try/catch lives inside a fallback handler, the catch block's logging is production-relevant even when the surrounding info logs aren't.

Each of these alone is benign. The intersection makes a failure mode that's invisible from both ends: no error in the user's UI, no error in the developer's console, no error in the deploy logs. The bug was actively being handled correctly by code that didn't know it was the bug.

What Changed After

Three changes landed in the fix PR.

The CSP directive. One line in vercel.json. Adding 'wasm-unsafe-eval' to script-src permits WebAssembly compilation and nothing else. WASM runs in its own sandbox, has no DOM access, and can only call back into JavaScript through imports the host page explicitly provides. It's the directive CSP3 added specifically so libraries like WebLLM wouldn't have to ask for 'unsafe-eval', which is genuinely riskier.

Removed import.meta.env.DEV gating from error-path logs in the WebLLM worker and the CoachingProvider fallback chain. Info and progress logs stay DEV-gated. Errors and fallback transitions surface in production. The principle, now written into the project conventions doc: gating an error log on DEV is the same as not having it. Treat it as the exception that needs justification, not the default.

Wired CSP violation reporting through Umami custom events. The securitypolicyviolation event fires regardless of whether report-to is configured in the CSP. Subscribing in the main entry point routes future violations into existing analytics:

document.addEventListener('securitypolicyviolation', (e) => {
  if (window.umami) {
    window.umami.track('csp-violation', {
      directive: e.violatedDirective,
      blockedURI: e.blockedURI,
      effectiveDirective: e.effectiveDirective,
    });
  }
});

No new infrastructure. The next CSP issue will be diagnosable in seconds.

The Takeaway

If you're debugging a WebLLM failure that works in dev and fails silently on Vercel, the answer is almost certainly 'wasm-unsafe-eval'. That's the practical takeaway.

The deeper one: when something fails, look at what's missing before you look at what's strange. Anomalies attract attention because they're concrete. You can hold a 0.3 KB response in your head and build theories about it. Absences are harder to notice because they're, by definition, not there. But the absence is usually the more honest signal. The anomaly tells you something happened. The absence tells you something didn't, which is closer to where the failure actually lives.

The thirty-second check almost always exists. Run it.

I'm building Holocron, a browser-based combat log analyzer for Star Wars: The Old Republic. The earlier WebLLM spike post covers why I chose this stack. Holocron is free and requires no install — try it at holocronparse.com.

AI didn't take your dev job. It gave you a new one.

Michael Stelly — Fri, 01 May 2026 17:15:53 +0000

A production release failed. Twelve hours of debugging with Claude pointing confidently in the wrong direction, with me following. When I finally found the real cause buried in the error logs, I did what most developers do. I blamed the tool.

That was the wrong diagnosis, and I knew it the moment I said it out loud.

The AI didn't fail. The workflow did. Those gaps, no HTTPS testing, no failure model for what happens when the UX breaks, would have bitten me with any developer, human or AI. The AI didn't create them. It amplified the damage because fast code generation let me ship faster than my verification loop could catch up.

That's the loop companies need to understand before they start cutting headcount.

Agents deliver value on tasks with clear, mechanical outcomes. The build passes or it fails, the tests run or they don't. The human's job there shifts from writing the code to reviewing it. That's a real productivity gain.

Agents break down when trusted for diagnosis, when they're asked to explain why something is broken. AI produces statistically plausible explanations, not verified ones. It reads like confidence. The error logs had the real answer the whole time, but finding it myself would have required breaking the loop. "Claude says X, I merge X."

The shift agents actually require is this: the developer's job doesn't go away, it changes. Your job becomes proving the AI wrong before it ships. That's a harder job. It requires maintaining enough understanding to independently verify code you didn't write.

"AI slop" is what happens when that role goes unfilled. It's not a property of the tool. It's a property of the workflow.

Agents won't replace developers. They'll expose which organizations understand that distinction and which ones are about to learn it the hard way. You can't remove the human from the loop and expect the loop to hold. The assumption that you can is the actual death knell, not for agents, but for the teams that bet on it.

--
Michael Stelly is a Senior React Native Technical Lead and founder of ReFactory, a consultancy specializing in React Native architecture and modernization. Building production RN apps across healthcare, logistics, and retail since 2018.

The Question Nobody Asks Before Picking a Mobile AR Platform

Michael Stelly — Fri, 17 Apr 2026 18:05:26 +0000

Most teams pick a framework and live with it for years. Here's how to get it right the first time.

I evaluated three platforms for a client porting an Android AR app to iOS. The project was 3D object scanning for warehouse operations. It needed to replicate existing camera-based capture workflows, integrate with a React backend, and eventually ship as a reusable third-party SDK.

Most teams skip this analysis. They grab the framework their devs already know, ship it, and spend the next two years paying for it. Four criteria drove the evaluation. React Native won.

The contenders

Native Swift -- unrestricted ARKit access, no abstraction layer, requires a Swift specialist
Flutter -- native AR bridge via ar_flutter_plugin / arkit_plugin, uses Dart
React Native -- native AR bridge via ViroReact (relaunched as ReactVision by Morrow in 2025), uses TypeScript/JS

Criterion 1: Level of effort

If your team already works in React, React Native costs you nothing to adopt. Same language, same mental model, same toolchain, same state management patterns.

Flutter requires Dart. Dart is not hard, but it is not JavaScript, and "learn a new language first" is a tax you do not need on a focused feature port with a deadline attached.

Native Swift demands a specialist or a months-long ramp, and it produces a codebase that shares nothing with your Android implementation. Maintenance debt starts accumulating before you ship a line of feature code.

React Native wins this one going away.

Criterion 2: AR API completeness

This is where cross-platform skepticism usually lives, and it is a fair concern. The question is whether the abstraction actually exposes the full native AR API or quietly caps your capabilities.

ViroReact compiles to true native draw calls. It is not a WebView. Not a shim. On iOS it calls ARKit directly; on Android it calls ARCore. When a capability falls outside ViroReact's abstraction, React Native's native module bridge fills the gap. Writing a custom native module to call ARKit APIs directly is documented, production-tested, and well-covered by community examples.

Flutter's AR plugins are architecturally comparable -- native bridges, not web wrappers. But the ecosystem is fragmented. Maintenance has lapsed on key packages. A community fork emerged to compensate. You would write the same custom native code to hit advanced scanning APIs, just in a less familiar environment with thinner documentation to lean on.

Native Swift has unrestricted access to the full ARKit surface -- Object Capture, RoomPlan, LiDAR, no abstraction in the way. That is the honest ceiling. The gap narrows fast once you account for React Native's native module bridge, and both cross-platform frameworks can reach the same APIs. React Native's path there is just better documented and more widely used in production.

One note specific to warehouse environments: ARKit's object scanning degrades under inconsistent lighting. Variable and harsh conditions affect all three platforms equally. Put it in your QA plan regardless of which framework you pick.

Criterion 3: Maintainability

React Native means one codebase. The team that owns Android owns iOS. No skill set fragmentation, no parallel development tracks.

A lot of React Native developers wrote off ViroReact during the years it went quiet. That calculus changed in 2025 when Morrow acquired it, relaunched it as ReactVision, and put a full-time engineering team behind it. MIT licensed, commercially backed, actively maintained. The risk profile is not what it was.

Flutter's ar_flutter_plugin ecosystem has not had that kind of investment. The forked engine variant is a community-maintained patch on a maintenance gap. That is a shaky foundation for a production AR feature.

Native Swift gives you two codebases, two toolchains, and two sets of required skills, maintained in parallel indefinitely. The operational cost is real and it grows.

Criterion 4: SDK strategy

This is the criterion that actually decided the evaluation, and the one most platform comparison posts skip entirely because they are thinking about apps, not platform products.

When the feature eventually becomes a distributable SDK, the platform you picked determines your distribution reach and your maintenance overhead in ways that are very hard to undo.

A React Native module ships as an SDK that integrates naturally into any React Native app and shares business logic with React web applications. Consumers work in a JS/TS ecosystem they already know. The community has well-established patterns for taking an internal React Native feature and packaging it for third-party consumption.

A Flutter SDK requires consuming apps to be Flutter apps. It shares nothing with a React web stack. That is not a niche limitation -- it is a structural ceiling on adoption.

A Native Swift SDK has the broadest iOS reach but produces an Android-incompatible artifact that shares nothing with a React stack. You are effectively maintaining three separate products.

If SDK distribution is anywhere in your roadmap, React Native is the only viable option.

The verdict

Criterion	React Native	Flutter	Native Swift
Level of effort	Lowest	Medium-High	High
AR API completeness	Full native access	Full native access	Unrestricted
Maintainability	Single codebase, funded library	Fragmented plugins	Two codebases
SDK strategy	Full cross-stack alignment	Flutter-only	iOS-only

React Native won because it was the right tool for the job. The team already speaks the language. ViroReact has funded maintenance behind it now. The API access is uncompromised. The SDK path is viable. None of the other options can say all four.

If deep ARKit gaps surface during implementation that cannot be resolved through the native module bridge, native Swift is a legitimate fallback. A targeted proof-of-concept sprint will answer that question in days.

Michael Stelly is a Senior React Native Technical Lead and founder of ReFactory, a consultancy specializing in React Native architecture and modernization. Building production RN apps across healthcare, logistics, and retail since 2018.

I Ran Three LLMs Entirely in the Browser to Power an AI Coaching Feature. Here's What I Measured.

Michael Stelly — Wed, 08 Apr 2026 19:10:25 +0000

I'm building Holocron, a browser-based combat log analyzer for the Star Wars: The Old Republic (SWTOR) video game. The core product thesis is that parsers that stop at showing you numbers aren't useful enough. A good tool tells you what to do about them.

The coaching layer I'm building takes ~1500 tokens of structured combat stats (spec, abilities, DPS numbers, rule-based findings) and returns ~500 tokens of plain-language guidance. It runs after parsing, entirely client-side. No server. No account. No data leaving the browser.

I already had Ollama working as a local LLM provider. But Ollama requires the user to install a background service, pull a model, and make sure it's running. For a tool where frictionless entry is a design constraint, that's a real drop-off risk. So I ran a spike to find out whether @mlc-ai/web-llm with WebGPU could replace that setup entirely: just open the page, wait under 30 seconds on first visit (measured 23.7s on the test hardware), and get AI coaching with zero install.

This post covers the full methodology, every number I measured, and the implementation decisions I made based on the results.

The Output Contract

Before getting into models and benchmarks, it helps to understand exactly what I needed the LLM to produce. The coaching system has a strict output schema:

interface CoachingOutput {
  narrativeSummary: string;         // 2-3 sentence performance narrative
  additionalFindings: Array<{
    priority: number;               // 1-3
    headline: string;
    body: string;
    recommendation: string;
  }>;                               // max 3 items
  additionalPositives: string[];    // max 3 plain strings
}

The schema is intentionally flat and bounded. additionalPositives is an array of strings, not objects. This matters. A lot. I'll come back to it.

Production validation rejects anything that doesn't conform. There's no "close enough" here.

Why WebLLM

WebLLM is an in-browser LLM inference engine built by the MLC AI team. It compiles models into a WebGPU-accelerated WASM runtime, ships a prebuilt model library hosted on HuggingFace, and exposes an OpenAI-compatible API. You load a model with CreateMLCEngine(), then call engine.chat.completions.create() exactly like you would with the OpenAI SDK.

The two features that made it worth spiking:

Grammar-constrained generation. WebLLM supports response_format: { type: 'json_object', schema: ... }, implemented at the WASM layer. This isn't prompt engineering hoping the model behaves. It enforces the schema at the token sampling level. The model literally cannot produce output that violates the schema.

OPFS caching. Model weights are cached to the Origin Private File System after the first download. A 1.3 GB model that takes 23 seconds to load cold takes 2.3 seconds warm. Repeat users pay nothing.

Test Setup

Hardware: Apple Silicon Mac (Apple M3 Max, 64gb integrated memory)
Browser: Chrome (WebGPU enabled)
WebLLM version: 0.2.82
Benchmark: 10 coaching prompts per model using the production SPF prompt structure (1500 token input, targeting 500 token output)
Quality scoring: Automated 6-signal composite (0-100 scale), equally weighted: (1) narrative depth — word count and sentence structure of narrativeSummary; (2) schema compliance — all required fields present and correctly typed; (3) template parroting — prompt text appearing verbatim in output; (4) ability name accuracy — capitalized phrases cross-referenced against ability names present in the input; (5) finding duplication — semantic overlap across additionalFindings items; (6) actionability — presence of concrete, imperative language in recommendation fields. Template parrot and hallucination counts in the side-by-side table are raw per-prompt tallies, not components of the composite score.

I tested three models, chosen to cover the quality/size/speed tradeoff space:

Model	Size	Notes
`Llama-3.2-1B-Instruct-q4f16_1-MLC`	~0.7 GB	Smallest viable instruct model
`Llama-3.2-3B-Instruct-q4f16_1-MLC`	~1.3 GB	Sweet spot candidate
`Phi-3.5-mini-instruct-q4f16_1-MLC`	~2.0 GB	Quality ceiling for this size class

I also ran the same 10 prompts against plain Ollama (no grammar enforcement) as a baseline for each model. That comparison turned out to be the most interesting part of the whole exercise.

Results

Llama 3.2 3B

Metric	Value	Target	Verdict
Download size	~1.3 GB		Acceptable
Cold load time	23.7s	≤ 30s	PASS
Warm load time	2.3s		Excellent
Tokens/sec	49.8	≥ 10	PASS
GPU memory	~3.0 GB	≤ 2 GB	FLAG
JSON parse success	10/10	≥ 9/10	PASS
Schema valid	10/10	≥ 9/10	PASS
Avg quality score	76/100	subjective	Good
Avg latency/prompt	5.8s		Acceptable

Content quality: PASS

Strengths: substantive narratives (avg 20+ words), zero template parroting, all findings have real body and recommendation text, references specific numbers from input in 8/10 prompts.

Weaknesses: hallucinated ability names in 9/10 prompts (2-4 per prompt), occasional duplication of findings across 4 of 10 prompts, VRAM at ~3 GB exceeds the 2 GB flag.

GPU memory is measured at the browser process level and includes driver and WebGPU runtime overhead beyond model weights. The 3B weights alone are ~1.6 GB at 4-bit quantization; the remainder is KV cache at 1500 token context plus browser overhead. Numbers will vary across machines and Chrome versions. The 2 GB threshold assumes a minimum-spec user running SWTOR on a machine with 8 GB unified memory: the game typically holds 3–4 GB GPU memory under load, leaving 4 GB headroom. Anything above 2 GB for the coaching model narrows that margin on older hardware.

Ollama baseline comparison: Plain Ollama 3B is 10/10 JSON valid but only 1/10 schema valid. The model consistently emits additionalPositives as objects with headline/body/recommendation fields instead of plain strings. This is a silent breaking failure. Grammar-constrained WebLLM generation is 10/10 schema valid under identical prompts. Content quality: Ollama 73/100 vs WebLLM 76/100 -- no degradation from running in-browser.

Llama 3.2 1B

Metric	Value	Target	Verdict
Download size	~0.7 GB		Good
Cold load time	11.7s	≤ 30s	PASS
Warm load time	1.4s		Excellent
Tokens/sec	118.6	≥ 10	PASS
GPU memory	~1.1 GB	≤ 2 GB	PASS
JSON parse success	10/10	≥ 9/10	PASS
Schema valid	10/10	≥ 9/10	PASS
Avg quality score	70/100	subjective	Marginal
Avg latency/prompt	2.1s		Fast

Content quality: MARGINAL FAIL

The numbers look great. 118.6 tok/s. 11.7s cold load. 1.4s warm. 0.7 GB download. Under the hood it falls apart.

Template parroting in 7/10 prompts -- the model echoes prompt text like "Things the player did well that the rule engine missed" verbatim in the output. Prompt 9 returned all three additionalPositives as identical copies of that string. Individual prompt scores ranged from 46 to 96. About 30% of runs would produce output that embarrasses the product. Speed doesn't offset that.

Ollama baseline comparison: Plain Ollama 1B is 8/10 schema valid (better than the 3B, because the simpler model apparently follows field type instructions more literally). Content quality: Ollama 52/100 vs WebLLM 70/100. The grammar constraints improve structural compliance and seem to improve content quality too, but the underlying weaknesses (parroting, duplication, hallucination) persist.

Phi-3.5 Mini

Metric	Value	Target	Verdict
Download size	~2.0 GB		Large
Cold load time	37.4s	≤ 30s	FAIL
Warm load time	2.4s		Good
Tokens/sec	52.5	≥ 10	PASS
GPU memory	~2.3 GB	≤ 2 GB	FLAG
JSON parse success	10/10	≥ 9/10	PASS
Schema valid	10/10	≥ 9/10	PASS
Avg quality score	77/100	subjective	Good
Avg latency/prompt	6.8s		Acceptable

Content quality: PASS

Best average quality score (77), best narrative depth, most actionable recommendations, zero template parroting. Loses on cold load (37.4s exceeds the 30s threshold) and both VRAM flags. The 1 point quality delta over 3B doesn't justify the extra 700MB of download and the load time failure.

Side-by-Side

	3B	1B	Phi-3.5
Cold load	23.7s	11.7s	37.4s
Warm load	2.3s	1.4s	2.4s
Tok/s	49.8	118.6	52.5
Latency/prompt	5.8s	2.1s	6.8s
Download	1.3 GB	0.7 GB	2.0 GB
VRAM	~3.0 GB	~1.1 GB	~2.3 GB
Quality	76	70	77
Schema valid	10/10	10/10	10/10
Template parrot	0/10	7/10	0/10
Hallucinations	9/10	5/10	8/10
Quality floor	52	46	67

The Finding That Changes the Architecture

The Ollama baseline comparison wasn't in the original spike plan. I added it as a sanity check. It turned out to be the most important data in the whole exercise.

Plain Ollama 3B (no grammar enforcement) fails schema validation 90% of the time on this output contract. The model produces valid JSON. It just puts objects where the schema expects strings. parseLlmResponse() rejects it.

This means the existing Ollama integration, before this spike, was silently broken at the schema level for the 3B model. It would have worked fine for smaller models that happen to follow field type instructions more literally, but for the model you actually want to use for quality coaching, it would fail in production nearly every time.

WebLLM's grammar-constrained generation doesn't improve the situation. It defines the situation. Without it, you're rolling the dice on whether the model happens to output the right types.

Implication for any project using Ollama for structured output: Ollama added JSON schema support to its format parameter in v0.4. Use it. Note that Ollama enforces schema compliance at the completion layer, not at the token sampling level — it's not equivalent to constrained decoding, but it substantially improves structured output reliability over prompt engineering alone. If you're relying on prompt engineering alone to get schema-compliant output from a small model, you're going to see silent failures in production that look like valid JSON until your validator catches them.

The Ability Hallucination Problem

Every model, at every size, hallucinates ability names. The 3B invents them in 9 out of 10 prompts. The 1B in 5 out of 10. Phi-3.5 in 8 out of 10.

Coaching that tells a player to "increase your uptime on Shadow Strike" when their class doesn't have an ability called Shadow Strike destroys credibility instantly. This is domain-specific and model-agnostic. The models don't have SWTOR ability databases. They pattern-match on capitalized phrases that look like they belong in a game context and generate plausible-sounding names.

The mitigation I'm implementing: post-process every LLM response against the set of ability names present in the input prompt. Any capitalized phrase in the output that isn't in the known set gets flagged. Starting in warn mode (log to console) before considering strip mode, because I want observability into how often this fires before making a content decision that could remove legitimate text.

This is a reminder that domain-specific hallucination isn't solved by model size. It's solved by grounding. If you're building in a domain with specific terminology (game abilities, medical terms, legal citations), plan for a validation pass.

Implementation Decisions

These are the implementation decisions the spike produced — the design I'm building toward. None of this is merged to production yet.

Chosen model: Llama-3.2-3B-Instruct-q4f16_1-MLC. Meets all performance targets. Quality comparable to Ollama baseline. Zero user setup.

Web Worker is non-optional. CreateWebWorkerMLCEngine runs all inference off the main thread. Running it on the main thread freezes the UI during the ~24 second cold load. This is not optional.

Lazy loading. The model doesn't load on page load or provider construction. It loads on the first generateCoaching() call, with a progress callback wired to a UI progress bar. Repeat users hit the OPFS cache at 2.3s.

VRAM guard. The 3B model uses ~3 GB GPU memory, which can conflict with the game running simultaneously on 8 GB machines. Before loading, call navigator.gpu.requestAdapter() and surface a warning if the device looks constrained. Don't block the load. Just warn. Sustained inference at ~50 tok/s also has thermal and power-draw implications on a laptop running the game simultaneously; the lazy-load design keeps idle overhead at zero.

1B fast mode. Exposed as an opt-in user preference (webllmModel: '3b' | '1b'), persisted to localStorage. Disclosed as "Fast mode uses a smaller model. Coaching depth may be reduced." The 1B quality floor is too low to be a default, but at 118.6 tok/s and 11.7s cold load it's genuinely compelling for users who know what they're trading.

Fallback chain: WebLlmProvider (if WebGPU available) -> OllamaProvider (if localhost:11434 reachable, with schema enforcement) -> rule-based coaching (always available). Never let a WebLLM failure surface to the user.

CSP update. Model shards fetch from HuggingFace CDN. Add bounded connect-src exceptions for https://huggingface.co and https://raw.githubusercontent.com. No wildcards.

What I'd Do Differently

Test grammar enforcement before testing model quality. The schema compliance numbers are what determine whether the integration works at all. Content quality is a secondary concern. A model that produces 80/100 content but fails schema validation 50% of the time is less useful than a model that produces 70/100 content and passes validation 100% of the time.

For anyone running a similar spike on a different output schema: start with structured generation enforced at the runtime level. Don't test prompt-engineering-only compliance and expect it to generalize.

Conclusion

WebLLM with WebGPU is production-ready for this use case — and it's what I'm building toward. The 3B Llama model clears every performance target, produces coaching quality that matches the Ollama baseline, and requires zero user setup. Grammar-constrained generation isn't a nice-to-have -- it's the feature that makes small-model structured output viable at all.

The ability hallucination problem is real and unsolved by model size. Plan for a post-processing validation pass if your domain has specific terminology.

The most useful thing I measured was the thing I almost didn't measure: what happens when you remove the grammar enforcement. The answer is that it breaks quietly and often. If I were running this spike again, I'd run the Ollama baseline first — before testing any WebLLM model. Schema compliance is a binary gate. There's no point benchmarking content quality on a model whose output your validator will reject.

Holocron is a browser-based SWTOR combat log analyzer. It's free, requires no install, and all parsing happens client-side. If you play SWTOR and want to understand your logs, try it at holocronparse.com.

The React Native Upgrade Decision Framework: Predicting 10-38x Cost Multipliers

Michael Stelly — Mon, 06 Oct 2025 18:07:23 +0000

Deferred React Native upgrades multiply costs 10-38x. Here's how to predict which category you're in before approving budget.

When to Use This Framework

Use this assessment if you're:

Inheriting a React Native codebase and need to understand what you're walking into
Evaluating an upgrade request from your team
Planning quarterly tech debt remediation
Facing app store compliance deadlines

This framework emerged from 12 enterprise React Native migrations at Fortune 500 retailers and mid-market companies. It identifies the patterns that separate routine upgrades from budget-destroying rewrites.

The 18-Month Inflection Point

React Native apps deferred beyond 18 months don't just need updates—they require reverse-engineering code written by people who've left the company. Costs don't scale linearly. They multiply exponentially.

Real example: An app four years out of date required $380,000 to fix. Eighteen months earlier, the same work would have cost $10,000. The difference: all three disaster conditions appeared together (ghost team, stale dependencies, no RN expertise) plus crossing four major breaking points.

The pattern repeats across organizations. Understanding where your app sits on this curve determines whether you're approving a $30K sprint or a $300K strategic project.

The Three Conditions That Predict Disaster

Across 12 migrations, three conditions consistently appeared together in projects that became forced rewrites:

1. Ghost Team

Original developers who built the app no longer work there. Nobody on the current team understands architectural decisions, custom integrations, or why certain code exists.

2. Stale Dependencies

Third-party libraries haven't been updated in 18+ months. Many are abandoned. Some have security vulnerabilities with no patches available.

3. Capability Gap

Current team has no production React Native experience (fewer than 2-3 developers who've shipped RN apps to production). They've inherited an RN app but have web-only or native-only backgrounds.

When all three conditions exist simultaneously, you're in rewrite territory. The technical debt is organizational, not just code-based.

The Version Gap Amplifier

Version distance matters, but not linearly. React Native has four major breaking points where the entire ecosystem fractures:

Version 0.60 (2019): Android compatibility overhaul
Versions 0.68-0.70 (2022): New Architecture introduced
Version 0.72 (2023): Package reorganization
Version 0.76 (2024): New Architecture becomes mandatory

Each breaking point adds significant complexity. The gap between version 0.59 and 0.70 isn't incremental—it's an architectural chasm. With React Native 0.81 now in production, teams on versions below 0.76 face five major breaking points—making the migration window even more critical.

Count how many breaking points sit between your current version and target. Each one represents a major migration, not a simple update.

The Three Hidden Costs

Version gaps determine technical complexity. But three cost categories determine actual budget impact:

Opportunity Cost

Two weeks of development becomes months of calendar time. During those months: roadmaps stall, competitors ship your planned features, revenue updates sit in backlog. A $35,000 migration creates $140,000 in lost revenue opportunity.

Velocity Collapse

Upgrades create an 8-week productivity crater:

Weeks 1-2: Migration work (0% feature velocity)
Weeks 3-4: Bug fixes (30% velocity)
Weeks 5-6: Performance optimization (50% velocity)
Weeks 7-8: Recovery (70% velocity)

A "2-week upgrade" costs $144,000 in lost productivity—3x the direct engineering cost.

Knowledge Transfer Multiplier

When code was written by former employees, every fix takes longer:

Developer on team with context: Normal speed
Left within 6 months: 2.5x longer
Left over a year ago: 4x longer
Multiple developers gone: 7x+ longer

If former employees wrote 60%+ of your code, multiply all estimates by 4-7x minimum.

The Pattern-Based Assessment

Based on observed patterns across migrations, not mathematical formulas:

Manageable Upgrade

Conditions:

Team has production React Native experience
Original developers available OR well-documented codebase
Dependencies updated within last 12 months
Crossing 0-1 major breaking points

Primary Risk: Underestimating opportunity cost

Timeline: 1-2 weeks

Budget: $10-30K

Team: 1 senior developer

High-Risk Migration

Conditions:

Team has production React Native experience BUT
Original developers gone OR stale dependencies (18+ months) OR
Crossing 2 major breaking points

Primary Risk: Velocity collapse extends beyond migration window

Timeline: 4-8 weeks

Budget: $50-150K

Team: 2-3 developers plus contractor/consultant

Rewrite Territory

Conditions:

All three disaster patterns present:
- Ghost team (original developers gone)
- Stale dependencies (18+ months old)
- Capability gap (no production RN experience)
Often crossing 3+ major breaking points

Primary Risk: Project becomes permanent crisis mode

Timeline: 3-6 months

Budget: $300K-1M+

Team: New team or heavy contractor involvement

Team capability dominates all other factors. Without production React Native experience, costs multiply significantly regardless of version gap or dependency health.

Three Diagnostic Questions

Ask these in your next planning meeting. The pauses reveal hidden risk.

Question 1: "What broke the last time we upgraded?"

What you're testing: Institutional memory

Strong answers:

Specific component names and technical details
Documented runbooks with time estimates
Lessons learned applied to current architecture

Weak answers:

"I think it went fine"
"No one on the current team was here for that"
Searching through Slack or git history to find answers

What it means: If your team can't describe what broke last time with specifics, they can't predict what will break this time. Add 40% to any timeline estimate.

Question 2: "Who owns each of our custom integrations?"

What you're testing: Knowledge transfer risk and capability gaps

Strong answers:

Names of current employees with specific ownership
Recent updates to custom code (within 6 months)
Clear documentation and test coverage

Weak answers:

"I think [former employee] wrote that"
"The build works, so we haven't touched it"
Long pause followed by code history searches

What it means: Custom code for cameras, payments, sensors fails during upgrades in ways standard features don't. If answers involve former employees, multiply estimates by 2.5x.

Question 3: "What's our rollback plan if this fails in production?"

What you're testing: Production risk awareness and strategic thinking

Strong answers:

Specific technical approach (staged rollout to 10% of users)
Monitoring plan with defined thresholds (crash rate >2% triggers rollback)
Tested rollback procedures with documented timing (can revert in under 2 hours)

Weak answers:

"We test thoroughly, so it shouldn't fail"
"We'll fix issues as they come up"
Blank stares or "we can roll back the app store version"

What it means: Teams without rollback plans don't understand production risk. Without rollback capability: emergency fixes create new bugs, app ratings collapse, you explain to the board why routine maintenance destroyed user satisfaction.

Decision Matrix

Assessment Tier	Team Capability	Timeline	Budget	Approval Level
Manageable	RN production experience	1-2 weeks	$10-30K	Team lead approval
High Risk	RN experience OR can hire expertise quickly	4-8 weeks	$50-150K	Director approval + dedicated sprint
Rewrite	No RN experience + ghost team + stale deps	3-6 months	$300K-1M+	VP/CTO approval + strategic initiative

The Two Scenarios

Scenario A: Systematic Maintenance

You're 3-6 months behind current, not 18+. You have team members with RN production experience. You have tested rollback procedures. The compliance deadline is annoying but manageable. Budget impact: $30-50K.

Scenario B: Deferred Maintenance

You're 18+ months behind. Dependencies are failing. Your team has no RN production experience. The compliance deadline becomes existential crisis requiring emergency contractor spending. Budget impact: $300K-1M+.

Same external deadline. 10-30x difference in cost. The variable is whether you assessed risk before it became crisis.

What to Do This Week

Step 1: Identify which tier your app occupies

Do you have all three disaster conditions? (Ghost team + stale deps + capability gap)
Count breaking points between current and target version
Calculate dependency age (when were top 20 dependencies last updated?)

Step 2: Ask your team the three diagnostic questions

Schedule 30 minutes this week
Document the answers (especially the pauses and uncertainty)
Note which questions triggered searches through old documentation

Step 3: Make decisions based on assessment, not hope

Manageable tier: Approve as routine sprint work
High-risk tier: Budget for dedicated project with external expertise
Rewrite tier: Begin strategic planning for application replacement

The Bottom Line

Technical debt accumulates silently until external forces—app store requirements, security vulnerabilities, competitor pressure—make it visible. By that point, costs have multiplied 10-38x and options have narrowed to "expensive" or "more expensive."

The assessment costs nothing. The surprise costs everything.

Teams that successfully navigate React Native upgrades see these patterns before they become board-level problems. They assess before committing budget. They ask revealing questions before accepting estimates. They understand true costs before approving work.

The patterns repeat predictably. The breaking points don't move. The only variable is whether you see them coming in time to make strategic decisions instead of reactive ones.

Michael Stelly is a Senior Frontend Engineer specializing in React Native architecture and enterprise migrations. This framework emerged from leading upgrades at Fortune 500 retailers and analyzing migration patterns across 12 enterprise applications over seven years.

The Walls That Turn $10k Updates Into $300k Rewrites

Michael Stelly — Tue, 30 Sep 2025 16:05:05 +0000

The $380,000 Wake-Up Call

A Fortune 500 retailer called me after receiving a $380,000 quote to update their React Native app. The same update would have cost $10,000 eighteen months earlier.

In Part 1, I established the 18-month rule: React Native apps that go 18 months without updates cost 10x more to fix than maintain. Now I'll show you exactly why—four specific version changes that transform routine updates into bankruptcy-inducing rewrites.

The Cost Multiplication Formula

After managing 12+ React Native migrations, the pattern never varies:

THE COMPOUND INTEREST OF NEGLECT
Base Update Cost: $10,000 (1 developer, 2 weeks)
Skip one critical version: $25,000-35,000
Skip two critical versions: $60,000-90,000
Skip three critical versions: $150,000+
Skip all four: Start shopping for rebuild quotes

Technical debt doesn't have a payment plan. It has a due date, and the penalty is bankruptcy.

Version 0.60: When Google Broke Every Android App

THE ANDROID BREAKING POINT

Cost to fix: $25,000-$35,000
Time required: 3-4 weeks
What breaks: Every Android component
Skip penalty: Multiplies next update by 3x
Business impact: App won't compile until fixed

Google forced every React Native app to migrate from their old Android Support Library to AndroidX. These systems cannot coexist—it's one or the other, no exceptions. Every third-party component in your app must be updated or replaced. The tools that compile your code into an app need complete overhaul.

When Bluecrew received Apple's removal notice, their React Native 0.61 app hadn't seen maintenance in 18 months—just break fixes to keep it running. No one had touched the dependencies. No one had updated the architecture. The technical debt had compounded silently until Apple forced their hand. With 90 days to comply or lose their app store presence, they faced a complete rebuild.

Version 0.68-0.70: The Architecture Tax

THE PERFORMANCE PARADOX

Cost to fix: $40,000-$80,000
Time required: 6-8 weeks
What breaks: Core app communication layer
Skip penalty: Performance degrades permanently
Business impact: App gets slower while competitors get faster

React Native introduced a completely new foundation system for how JavaScript communicates with native code. The cruel irony? Running both systems simultaneously—which happens by default—makes your app slower than before the "upgrade."

Your custom features may need complete rewrites. Developer productivity craters as they fight through hundreds of warnings. Most third-party components don't support the new system, forcing impossible choices between features and modernization.

Version 0.72: The Hidden $25,000 Reorganization

THE IMPORT MAZE

Cost to fix: $15,000-$25,000
Time required: 2-3 weeks
What breaks: Every internal connection
Skip penalty: Blocks all future updates
Business impact: Zero features, pure overhead

React Native reorganized its entire package structure. Every internal connection in your app—how your payment system talks to your user interface, how your login connects to your data—needs manual rewiring. We're talking hundreds, sometimes thousands, of connection points.

You're paying senior developers to achieve nothing visible. The app works exactly the same, just with different plumbing. Try explaining that to stakeholders.

Version 0.76+: The Point of No Return

THE MANDATORY MIGRATION

Cost to fix: $100,000+ (migration) or $300,000+ (rebuild)
Time required: 3-6 months
What breaks: Everything still on old foundation
Skip penalty: Not an option—update or die
Business impact: Feature freeze or start over

The old foundation system is dead. No compatibility mode. No grace period. Update or watch your app get delisted from app stores.

At this point, most companies face a brutal choice: spend $100,000+ forcing a migration that might fail, or spend $300,000 on a rebuild that definitely works. Neither option is good. Both could have been avoided.

The Bluecrew Crisis: A Cautionary Tale

"18 months of 'saving money' created a crisis that quarterly maintenance would have prevented."

Bluecrew's story is typical. After deploying their React Native app, they focused on growth, not maintenance. Break fixes only. The app worked, users were happy, why spend money updating what wasn't broken?

For 18 months, technical debt accumulated invisibly:

React Native fell behind by 11 versions
Dependencies went unmaintained
Security patches were ignored
The ecosystem moved forward while their app stood still

Then Apple's removal notice arrived. 90 days to comply with security requirements that their version could never meet.

The timeline:

Months 1-18: "The app works fine" - $0 spent on maintenance
Day 0: Apple removal notice arrives
Day 1-30: Internal team discovers it's not fixable with updates
Day 31-60: Three firms quoted rebuilds ranging from $300,000-$380,000; none would attempt updates
Day 61-90: Emergency rebuild with specialist

The real cost wasn't just the emergency rebuild. It was the three months of uncertainty, the risk to their business model if the app was delisted, and the opportunity cost of developers fighting fires instead of building features. While Bluecrew spent 3 months migrating, their competitors shipped 12-15 new features. That's the real cost.

The Multiplication Effect Nobody Explains

These versions don't add difficulty—they multiply it:

Walls to Cross	Complexity	Reality Check
One wall	Manageable sprint	Annoying but doable
Two walls	Everything breaks twice	Team considers quitting
Three walls	Dependencies fight each other	CTO updates resume
Four walls	Cheaper to rebuild	Board asks hard questions

Every month you delay adds 15% to your bill. That's a higher interest rate than most credit cards.

What This Means for Your Industry

E-commerce: Your checkout breaks during Black Friday. Revenue drops to zero while you emergency patch.

Banking: You fail your next security audit. Regulators don't care about your update timeline.

Healthcare: You lose HIPAA compliance overnight. Legal asks why this wasn't prevented.

SaaS: Your mobile app becomes a competitive liability. Customers switch to competitors who maintained their apps.

The 30-Second Assessment

Is your React Native app already dead? Check these five indicators:

React Native version below 0.72? [Yes = +1 point]
Last update more than 12 months ago? [Yes = +2 points]
Original developers gone? [Yes = +1 point]
More than 20 outdated dependencies? [Yes = +2 points]
Build warnings fill your console? [Yes = +1 point]

Score 3+? You're already in crisis. The question isn't if you'll pay, but how much.

Companies Doing It Right

AVOIDING THE WALLS:
Walmart: Updates monthly, migration cost < $5k/quarter
Discord: Automated 90% of updates, 2-day turnaround  
Coinbase: Dedicated React Native maintenance team
Their secret: They never hit the walls

Meanwhile, companies in crisis:
Bluecrew: 90-day removal notice, emergency rebuild
Fortune 500 Retailer: $380,000 quote, considering abandoning mobile
Healthcare Startup: Failed HIPAA audit, mobile app offline 3 months

The Pattern I See Repeatedly

After 12+ React Native rescues, the pattern never changes:

App launches successfully
Team moves to next project
"It's working fine" for 12-18 months
External force (app store, security, OS update) breaks app
Discovery that updates are now impossible
Emergency rebuild at 10x the maintenance cost

Bluecrew fit this pattern exactly. So did my work at Sam's Club, where their 0.61.5 app had already crossed into "archaeology" territory. The pattern is so consistent I can predict costs within 10%.

If You're Already in Crisis

If you're already past the 18-month mark, don't panic. The worst decision is further delay. Even apps 24+ months behind can be saved—it just requires accepting the reality of rebuild over update. Every day you wait adds to the final bill, but starting today stops the bleeding.

The Decision You Face Today

While you're reading this, your competitors are shipping features. Instagram, Walmart, and others update their React Native apps monthly. They're not doing it for fun—they're avoiding the walls that trap you.

Option A: Start Quarterly Maintenance Now

$40,000/year predictable cost
Zero feature freeze
Developers stay productive
Competitive advantage maintained

Option B: Wait for the Crisis

$150,000-$300,000 emergency cost
3-6 month feature freeze
Team burnout guaranteed
Explain to board why mobile revenue stopped

The math is unforgiving: preventive maintenance costs a fraction of emergency repairs.

You Can't Defer Forever

These events force updates whether you're ready or not:

App store security requirements: 90-day compliance or delisting
Payment processor updates: Update or lose transaction capability
iOS/Android annual updates: Your app breaks every September
Security vulnerabilities: Immediate patches or legal liability

WARNING SIGNS YOUR NOTICE IS COMING:
□ Your app requires iOS/Android versions from 2+ years ago
□ You're still asking for permissions deprecated in iOS 14
□ Your Android target SDK is below 31 (from 2021)
□ You haven't updated since before COVID

If you checked ANY box, start planning now.

What 12 Migrations Taught Me

Companies don't skip maintenance maliciously. They skip it because:

The app works today
Resources are tight
Other priorities seem more urgent
The accumulating debt is invisible

Until it's not. Until Apple or Google sends the notice. Until a critical security vulnerability is discovered. Until the latest iOS breaks your app.

By then, the 2-day updates have become 2-month rebuilds. The $10,000 maintenance has become $100,000+ emergencies.

What's Next

You now understand the 18-month cliff (Part 1) and the four walls that multiply costs (Part 2).

In Part 3, "The React Native Migration Playbook," I'll provide the tactical guide for crossing each wall—specific strategies that minimize risk and cost while keeping your app alive during migration.

But first, find out where you stand:

npx react-native info

Count your walls. Calculate your costs. Make your choice.

The math is consistent: 18 months of deferred maintenance equals a rebuild. Not an update, not a migration—a rebuild. Bluecrew proved this. Sam's Club proved this. The next proof might be your app.

Part 3 coming next: The step-by-step playbook for navigating each wall while keeping your app alive.

About the series: The React Native Foundations series covers the what, why, and how of maintaining healthy React Native apps. Part 1 revealed the 18-month cliff. Part 2 exposed the cost multipliers. Part 3 will show you exactly how to navigate the migration when you can't avoid it any longer.

Your React Native App Has 18 Months to Live

Michael Stelly — Mon, 22 Sep 2025 15:35:34 +0000

The call came at 4 PM on a Tuesday: "Apple just sent us a Q4 compliance notice. Our React Native app needs to meet new security requirements by year-end. Can you help?" His team had no mobile experience, the deadline was breathing down their necks, and they needed help fast. The culprit? Their app was still running React Native 0.61 - a version so outdated that app stores were flagging it for known security vulnerabilities that would never be patched.

Within 20 minutes of our first conversation, I knew we had a problem that went far deeper than a simple version bump. After our audit, they had worse news: another firm quoted $380,000 for fixing their 'simple' with a rebuild from scratch. Fortunately, I had a better plan.

THE $380,000 REALITY CHECK

App Profile: 20 screens, 10k users, e-commerce

External Rebuild Quote: $380,000

Specialist Rebuild Cost: $120,000 (9 months solo)

Client Savings: $260,000

Lesson: Experience matters when technical debt becomes unavoidable

The Technical Debt Compound Effect

After modernizing 12+ React Native apps, I've found the point of no return: 18 months. Skip quarterly updates for longer than that, and your linear fixes become exponential problems. The math is brutal but consistent.

Month 0–6: Simple updates, 2 hours each quarterly release

Month 7–12: Dependencies conflict, 8 hours per update

Month 13–18: Native module incompatibilities, 40+ hours

Month 19+: Complete rebuild recommended

When I contracted with Sam's Club as Senior Mobile Engineer in early 2022 to lead their React Native migration for the fresh seafood department workers' app, it was stuck on version 0.61.5 - already three years behind the ecosystem. We successfully migrated to 0.67.2, but the process revealed how quickly technical debt compounds when updates are deferred.

Five Signs Your React Native App Will Die in 2025

Through painful experience, I've identified five early warning signs that predict this exact scenario:

Your React Native version is below 0.72 (released June 21, 2023) - You're now 2+ years behind critical security patches including the Regular Expression Denial of Service (ReDoS) vulnerability that affected versions 0.59.0 to 0.62.3
npm outdated shows 20+ major version gaps - When your dependency tree is more than 50% unsupported packages, you're not looking at updates anymore - you're looking at archaeology
Your build times have significantly degraded from when you first started the project - This indicates fundamental configuration drift from modern React Native expectations
Your Android build fails on Gradle 8+ due to namespace conflicts in legacy native modules - New app store requirements will eventually force this upgrade whether you're ready or not
Console shows 15+ deprecation warnings on startup - These aren't just noise - they're countdown timers to broken functionality

I've seen all five symptoms at once exactly three times. All three required complete rewrites.

The Prevention Playbook

The React Native ecosystem's rapid evolution is both its greatest strength and its most dangerous trap. Stay current, and you ride the wave of continuous improvements. Fall behind, and you're fighting an entire ecosystem that has moved on without you.

QUARTERLY MAINTENANCE PLAYBOOK

☐ Update React Native by one minor version max

☐ Run npm audit fix for security patches

☐ Update React Navigation if using (breaking changes common)

☐ Test on latest iOS/Android beta releases

☐ Profile app performance, document any degradation

☐ Remove one unused dependency minimum

Time Investment: 16–24 hours per quarter

Version jump strategy: Never skip more than two React Native minor versions. The breaking changes accumulate too quickly for safe major jumps.

Dependency hygiene: Remove unused packages immediately. Every dependency is a potential failure point during updates.

Quick Audit Checklist

Run this 5-minute audit on your React Native app today:

☐ Check your version: Run npx react-native --version (current is 0.81)

☐ Count outdated dependencies: Run npm outdated and count major version gaps

☐ Test latest tools: Try building with the newest Xcode and Android Studio

☐ Measure build performance: Time a clean build from npx react-native run-android

☐ Count deprecation warnings: How many warnings appear in your console on app startup?

More than 3 red flags? You're approaching the 18-month cliff.

The Choice is Yours

The choice is stark: invest 2 hours monthly in updates, or $380,000 in a rebuild. If you're seeing any of these warning signs, you have a 6-month window to act before the compound effect makes updates impossible.

Start with a dependency audit today: npx react-native upgrade-helper. Check your current version against the latest React Native release. Count your deprecation warnings. Measure your build times.

The next 18 months will either cost you hundreds of thousands or hundreds of hours.

If this sounds like your situation, don't wait - reach out today. I'm currently accepting React Native modernization audits for fall 2025, and the earlier we catch these issues, the more options you have.

Don't become another emergency rebuild story.

This article kicks off my React Native Foundations series, where I'll cover the "what," the "why," and the "how" of maintaining a healthy React Native ecosystem that extends the practical life of all your applications. Today covered the "what" - the reality you're facing with your apps right now.

Next comes the "why": Foundations II: Upgrade or Perish, a four-part deep dive into why stakeholders face only one real decision - when to plan the upgrade or when to decommission the app. Wait long enough, and the app stores will make that choice for you.

About Michael: I've been building cross-platform mobile apps since 2011, starting with Titanium SDK and going all-in on React Native in 2018. Seven years and 12+ modernization projects later, I've helped companies including Sam's Club and Bluecrew avoid hundreds of thousands of dollars in rebuild costs by catching technical debt before it becomes a crisis. I specialize in rescuing legacy React Native applications and establishing sustainable development practices that prevent future emergencies.

Connect with me on LinkedIn or learn more about my services at Refactory.

Originally published on Medium

React Native Version Matrix: The Hidden Upgrade Path

Michael Stelly — Sun, 21 Sep 2025 22:47:43 +0000

Part 1 of 3: Why "Simple" Upgrades Become Multi-Week Migrations

I got a call from a manager whose React Native app was facing imminent removal from both Play and Apple app stores. His team had no mobile experience, and they were desperate. Within 20 minutes of our conversation, I knew this wasn't an upgrade problem—it was an archaeological dig.

The Bluecrew app was running React Native 0.61, which had been released in August 2019. Four years later in 2023, they weren't just behind—they were running a museum piece. After reviewing their codebase, I had to deliver news no manager wants to hear: this wasn't going to be an update. It was going to be a complete rewrite. The majority of their npm libraries were either outdated or completely abandoned.

As Tom, their manager, later wrote: "I took over a team that had a React Native project that was desperately out of date and at risk of immediate removal from Play and Apple app stores... Mike helped us rewrite the code base entirely."

The tragedy? Six weeks of systematic quarterly maintenance could have prevented the entire crisis. Instead, they needed a complete application rebuild that consumed weeks and required outside expertise.

Using my upgrade complexity formula—Upgrade Difficulty = (Version Gap × Architectural Changes) × Dependency Decay Rate—Bluecrew scored 47. Anything over 30 signals a rewrite candidate. They weren't upgrading; they were performing archaeology.

After leading React Native migrations at Sam's Club in 2022, Bluecrew in 2023, and analyzing 12 enterprise React Native apps over seven years, I've discovered that React Native upgrades aren't linear progressions—they're navigating a web of interdependencies where skipping the wrong version triggers cascade failures the changelog never mentions.

I've mapped these breaking changes into a predictable matrix. Four critical version walls determine whether your upgrade takes days or months. Position your app correctly, and updates become routine. Miss the pattern, and you'll find yourself explaining to stakeholders why your "simple update" has turned into a feature freeze.

The Pattern Behind the Chaos

Every React Native release contains fracture points—changes that break not just your code, but the entire ecosystem around it. The changelog calls them "improvements." Experienced developers know them as migration projects.

Take a seemingly innocent changelog entry: "Migrated to AndroidX." Sounds simple enough. But experienced React Native developers know this phrase signals an ecosystem-wide fracture where every Android dependency must choose sides, build times double, and apps that compile perfectly crash on user devices due to reflection errors in native code.

Or consider: "New Architecture available." What this actually delivered was two completely different architectures running simultaneously in the same app, requiring C++ expertise for modules that previously needed simple Java annotations, and event timing changes that broke carefully tuned animations.

The most insidious: "Packages moved to @react-native scope." A "simple" namespacing change that broke every import in your codebase, with no predictable pattern for where packages relocated.

The Cascade Effect at Enterprise Scale

Enterprise React Native upgrades reveal how these fractures compound across complex codebases. What looks like a routine version bump becomes a multi-week emergency project.

One "simple" upgrade typically cascades into:

Library migration and compatibility testing (3-5 days)
Architecture rewrites with minimal documentation (5-7 days)
Build configuration modernization (2-3 days)
Platform-specific conflicts requiring specialized expertise (1-2 days)
Dependency debugging with cryptic native errors (3-4 days)

Each fix reveals two more problems. Updating one library breaks your upload flow. Fixing navigation breaks deep linking. Every dependency touches its own ecosystem of subdependencies with their own compatibility requirements.

The result: weeks explaining to stakeholders why your "simple update" has turned into a feature freeze, while your supposedly stable app accumulates technical debt that compounds exponentially.

The Four Immutable Rules

After mapping breaking changes across 12 enterprise migrations over seven years, four patterns emerged that govern every React Native upgrade:

Rule 1: Walls Don't Move

AndroidX will always live at 0.60. The New Architecture divide stays at 0.68-0.70. Package reorganization happened at 0.72. These are geological layers in React Native's history that mark fundamental shifts in how the platform works. You can't skip them, only cross them.

Rule 2: Time Compounds Everything

A month-old React Native app requires 2 hours of updates. A year-old app requires 2 weeks. An 18-month-old app requires 2 months. The math is exponential because dependencies diverge, libraries get abandoned, and breaking changes accumulate in ways that can't be fixed incrementally.

Rule 3: Dependencies Determine Destiny

Your upgrade path isn't about React Native—it's about your slowest dependency. One abandoned package locks your entire app at an old version. That camera library pinned to React Native 0.58? Your entire app is now a 0.58 app, regardless of what version you think you're running.

Rule 4: Position Determines Difficulty

Some React Native versions offer 12-18 months of stability where updates are routine maintenance. Others trap you in constant update cycles where every change cascades into architectural decisions. Smart teams position themselves just after major walls and stay there until the next stable zone appears.

Strategic Positioning Matters

Every React Native app sits at a specific coordinate in this version matrix. Your position determines not just your current stability, but your future upgrade path and the complexity of every dependency decision.

Most upgrade guides recommend incremental updates. After managing 12 enterprise migrations over seven years, I disagree. Sometimes a clean rewrite is faster and safer than trying to bridge a multi-year technical debt gap. Bluecrew proved this—their complete rebuild took less time than attempting an "incremental" migration across multiple version walls.

The cascade effect isn't random—it follows predictable patterns. Understanding these patterns is the difference between routine maintenance and emergency projects that consume entire development cycles and team credibility.

Your supposedly "stable" React Native app is built on abandoned NPM packages, deprecated Android APIs, and native code written by people who've moved on to other companies. Every month you wait, the foundation shifts a little more.

The Path Forward

The React Native ecosystem fractures at four predictable points—version walls where the entire ecosystem breaks and requires complete migration strategies rather than simple updates. These walls don't move. They're geological layers that every app must eventually cross.

In Part 2, I'll map each critical wall in detail—what breaks, why it breaks, and the specific technical decisions that determine whether crossing them takes days or weeks. Each wall has its own failure patterns and migration requirements. Understanding them lets you position your app strategically and plan upgrades that align with business reality.

The walls are predictable, but only if you know what to look for. Most teams learn this the hard way—during emergency weekend deployments, explaining to stakeholders why the "simple update" has turned into a month-long project.

You don't have to learn it the hard way.

This is Part 1 of a 4-part series on navigating React Native upgrades. Follow me for Parts 2-4, where I'll detail the specific walls, cascade patterns, and decision frameworks that can save your team weeks of migration pain.

About Michael: I’ve been building cross-platform mobile apps since 2011, starting with Titanium SDK and going all-in on React Native in 2018. Seven years and 12+ modernization projects later, I’ve helped companies including Sam’s Club and Bluecrew avoid hundreds of thousands of dollars in rebuild costs by catching technical debt before it becomes a crisis. I specialize in rescuing legacy React Native applications and establishing sustainable development practices that prevent future emergencies.

Connect with me on LinkedIn or learn more about my services at Refactory.