<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Thomas John</title>
    <description>The latest articles on Forem by Thomas John (@tjthomasjohn).</description>
    <link>https://forem.com/tjthomasjohn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3767429%2F59629413-e88f-43f2-b37a-0cb8ee994b14.JPEG</url>
      <title>Forem: Thomas John</title>
      <link>https://forem.com/tjthomasjohn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tjthomasjohn"/>
    <language>en</language>
    <item>
      <title>When the Cloud Fails, the Browser Still Thinks</title>
      <dc:creator>Thomas John</dc:creator>
      <pubDate>Wed, 22 Apr 2026 00:12:23 +0000</pubDate>
      <link>https://forem.com/tjthomasjohn/when-the-cloud-fails-the-browser-still-thinks-2i7d</link>
      <guid>https://forem.com/tjthomasjohn/when-the-cloud-fails-the-browser-still-thinks-2i7d</guid>
      <description>&lt;p&gt;&lt;em&gt;Browser-native LLMs are the most underrated shift in edge AI. Here's why.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;3:17 AM. North Sea. 200 kilometers from the nearest coastline.&lt;/p&gt;

&lt;p&gt;The satellite uplink has been down since midnight. The drilling platform runs on skeleton watch. At exactly 3:17, a pressure sensor on mud pump P-3 starts drifting.&lt;/p&gt;

&lt;p&gt;Marcus, the on-call engineer, pulls up the asset interface on his tablet. Types what he sees:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"mud pump P-3 pressure readings drifting high since 0200, vibration also slightly elevated"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two seconds later:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Probable cause:&lt;/strong&gt; partial blockage or liner wear&lt;br&gt;
&lt;strong&gt;Action:&lt;/strong&gt; reduce RPM by 15%, schedule inspection at next safe window&lt;br&gt;
&lt;strong&gt;Escalate if:&lt;/strong&gt; pressure exceeds 420 PSI or vibration crosses 2.4g&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No spinner. No server. The satellite is still down.&lt;/p&gt;

&lt;p&gt;The model that just assessed that fault is running on Marcus's tablet — cached since the last port call, running on the tablet's GPU, no internet required.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Actually Happening
&lt;/h2&gt;

&lt;p&gt;Modern browsers ship with direct GPU access through an API called WebGPU. The &lt;a href="https://github.com/mlc-ai/web-llm" rel="noopener noreferrer"&gt;WebLLM&lt;/a&gt; project uses it to run large language models — real ones, billions of parameters — entirely inside a browser tab.&lt;/p&gt;

&lt;p&gt;Download once. Cache locally. Run on the GPU. Zero network calls per query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;webllm&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mlc-ai/web-llm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;webllm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CreateMLCEngine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Qwen2.5-3B-Instruct-q4f32_1-MLC&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a drilling equipment diagnostic assistant.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;engineerDescription&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DIAGNOSTIC_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tool_choice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;required&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same API as OpenAI. Runs offline. Ships inside your web app — no server to provision, no API key to manage, no usage bill.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Why not run an AI server on the ship itself?"&lt;/em&gt; Valid question. A ship-side GPU server costs $15,000–$30,000 in hardware, 400W continuous power, dedicated cooling, and someone to maintain it. When the server room floods — exactly when you need it most — every device on the ship loses AI simultaneously. With browser LLM, each device is independent. Nothing to lose because there's no single point of failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Edge Gets Smart
&lt;/h2&gt;

&lt;p&gt;The oil platform story is bigger than one engineer and one pump.&lt;/p&gt;

&lt;p&gt;In production telemetry systems, the standard monitoring pattern is threshold rules — a value crosses a line, an alarm fires. We've shipped these pipelines at scale. They work. They also cannot reason. They cannot synthesize across signals. They tell you &lt;em&gt;that&lt;/em&gt; something happened, never &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Pressure drifting high &lt;em&gt;plus&lt;/em&gt; vibration elevated &lt;em&gt;plus&lt;/em&gt; flow rate slightly reduced — an experienced engineer reads that combination as liner wear, not a sudden blockage. A threshold rule sees three independent events.&lt;/p&gt;

&lt;p&gt;A browser-resident model interprets the combination the way the engineer would. In plain language. On the device. With no operational data leaving the platform network.&lt;/p&gt;

&lt;p&gt;The asset stops being a data source. It becomes a narrator of its own condition.&lt;/p&gt;

&lt;p&gt;This is what edge AI actually looks like in distributed sensor environments — not a GPU server in a rack requiring its own reliability engineering, but inference embedded in the devices already in the field. Hardware that exists. Zero marginal cost per query. Available when the network isn't.&lt;/p&gt;

&lt;p&gt;OpenWrt routers, industrial HMIs running embedded Chromium on ARM, ruggedized tablets — all valid targets today. As sub-1B models compiled to WASM mature, the hardware floor drops further.&lt;/p&gt;




&lt;h2&gt;
  
  
  0430 Hours. Somewhere That Doesn't Appear on Maps.
&lt;/h2&gt;

&lt;p&gt;The forward operating base has been in communications blackout for six hours. Electronic warfare — the enemy is jamming everything. The field medic has two casualties. No medevac window.&lt;/p&gt;

&lt;p&gt;She types vitals into her laptop. The local model returns in under three seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"probable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tension-pneumothorax"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hemothorax"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"immediate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"interventions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"needle-decompression-right-2nd-ICS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"large-bore-IV-x2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No connectivity. No cloud. No PHI transmitted.&lt;/p&gt;

&lt;p&gt;This makes explicit what the oil rig only implies: sometimes the network being down is not a failure. It is an attack. Cloud-dependent AI fails the moment the adversary succeeds. A browser-resident model doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Privacy Is Architecture, Not Policy
&lt;/h2&gt;

&lt;p&gt;Most networks are up most of the time. And still — there are environments where sending data out is not a technical problem. It's a legal one.&lt;/p&gt;

&lt;p&gt;Portable glucometers. Handheld ECG readers. Spirometers in field screening programs. These devices increasingly run browser companion apps. The data they handle is among the most protected in existence.&lt;/p&gt;

&lt;p&gt;When a patient reading goes to a cloud LLM, it triggers a cascade: Business Associate Agreement, retention audit, training data policy review, ongoing compliance monitoring. In high-availability healthcare systems, we've seen this compliance surface grow with every model update the vendor ships.&lt;/p&gt;

&lt;p&gt;With browser LLM, the reading never leaves the device. Not because of policy. Because transmission is architecturally impossible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Reading interpreted locally — never transmitted&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Provide plain-language context for diagnostic readings. Do not diagnose.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Blood glucose: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;reading&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; mg/dL. Fasting: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fasting&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// "This reading is above the normal fasting range. Please consult your healthcare provider."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rural clinic. Mobile screening unit. Offline. Private. Instant.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Breaks
&lt;/h2&gt;

&lt;p&gt;This architecture is not for every application. Be honest about the constraints before you commit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold start is real.&lt;/strong&gt; The Qwen2.5-3B model is 1.5GB. On a corporate network that's a 2-minute first load. On mobile broadband it's longer. Plan for it — pre-cache via service worker at install time, not on first user interaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GPU floor matters.&lt;/strong&gt; Below a modern integrated GPU (Intel Iris Xe or better), inference drops to CPU fallback at ~1 tok/s. That's not interactive. Detect WebGPU availability and route to a server-side fallback for unsupported devices — don't leave users with a broken experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model quality has a ceiling.&lt;/strong&gt; A 3B parameter model handles structured output, classification, and short reasoning reliably. It hallucates on complex multi-hop logic. It degrades on inputs above ~4K tokens. For tasks that need frontier reasoning, escalate to cloud — don't try to replace GPT-4 with Qwen-3B.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;iOS Safari is still constrained.&lt;/strong&gt; WebGPU landed in Safari 18 with a 256MB buffer limit that restricts which models run. Android Chrome is solid. Desktop Chrome and Edge are solid. iOS is improving but not there yet for larger models.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pick Your Model
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Min GPU VRAM&lt;/th&gt;
&lt;th&gt;Typical Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-0.5B&lt;/td&gt;
&lt;td&gt;300 MB&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;~90 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-1.5B&lt;/td&gt;
&lt;td&gt;900 MB&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;~65 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-3B&lt;/td&gt;
&lt;td&gt;1.5 GB&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;~38–52 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phi-3.5-mini&lt;/td&gt;
&lt;td&gt;2.2 GB&lt;/td&gt;
&lt;td&gt;3 GB&lt;/td&gt;
&lt;td&gt;~28 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama-3.2-8B&lt;/td&gt;
&lt;td&gt;4.5 GB&lt;/td&gt;
&lt;td&gt;6 GB&lt;/td&gt;
&lt;td&gt;~12–18 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For structured output — filtering, diagnostics, classification, form-fill — &lt;strong&gt;Qwen2.5-3B is the sweet spot.&lt;/strong&gt; Fast enough to feel instant. Capable enough for production use on real tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who Should Be Paying Attention
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Industrial and field operations&lt;/strong&gt; — oil and gas, maritime, logistics, manufacturing. Anywhere operators work in connectivity-constrained environments with operationally sensitive data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense and government&lt;/strong&gt; — air-gapped networks, EMCON operations, ITAR-controlled systems. Cloud AI is often forbidden. Browser LLM works within those constraints without additional infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Healthcare at the point of care&lt;/strong&gt; — portable diagnostics, rural medicine, field triage. PHI stays on device by architecture, not by agreement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise SaaS in regulated industries&lt;/strong&gt; — legal, financial, HR. Any product where "add an AI feature" currently means "add an OpenAI dependency and all the compliance overhead."&lt;/p&gt;




&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;Models are getting smaller. Sub-1B parameter models capable enough for structured tasks are close — hardware floor drops to a $50 device.&lt;/p&gt;

&lt;p&gt;In-browser vector search is maturing. Local LLM plus local vector store equals a fully offline RAG system — a knowledge base that lives on the device, reasons over local documents, never sends a query anywhere.&lt;/p&gt;

&lt;p&gt;A field medic with a tablet: local model for clinical reasoning, local vector store for medical guidelines, full capability with zero connectivity.&lt;/p&gt;

&lt;p&gt;An engineer on a platform between satellite windows: local model interpreting equipment telemetry, local knowledge base of fault histories, full diagnostic capability when the uplink is down.&lt;/p&gt;

&lt;p&gt;The browser became a valid AI runtime quietly, while everyone was watching the cloud.&lt;/p&gt;

&lt;p&gt;It runs where the work happens. It works when the network doesn't. It keeps data where it belongs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with &lt;a href="https://github.com/mlc-ai/web-llm" rel="noopener noreferrer"&gt;@mlc-ai/web-llm&lt;/a&gt;. Model specs and browser support: &lt;a href="https://webllm.mlc.ai" rel="noopener noreferrer"&gt;webllm.mlc.ai&lt;/a&gt;. For native mobile and embedded targets: &lt;a href="https://github.com/mlc-ai/mlc-llm" rel="noopener noreferrer"&gt;MLC-LLM&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Designing Zero-Downtime Behavioral Migrations in Distributed Systems</title>
      <dc:creator>Thomas John</dc:creator>
      <pubDate>Thu, 12 Feb 2026 03:24:32 +0000</pubDate>
      <link>https://forem.com/tjthomasjohn/designing-zero-downtime-behavioral-migrations-in-distributed-systems-3j62</link>
      <guid>https://forem.com/tjthomasjohn/designing-zero-downtime-behavioral-migrations-in-distributed-systems-3j62</guid>
      <description>&lt;h2&gt;
  
  
  Formalizing safe, deterministic migration workflows for production environments
&lt;/h2&gt;

&lt;p&gt;Modern distributed systems evolve continuously. Configuration models&lt;br&gt;
change, abstractions are redesigned, and legacy structures must&lt;br&gt;
eventually be replaced.&lt;/p&gt;

&lt;p&gt;However, when a system is live, and high-availability is mandatory,&lt;br&gt;
Migration becomes far more than a data transformation exercise.&lt;/p&gt;

&lt;p&gt;It becomes a &lt;strong&gt;behavioral transition problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Unlike schema migration, behavioral migration modifies how a system&lt;br&gt;
executes in production. The system must remain available, correct, and&lt;br&gt;
consistent while its underlying configuration model changes. This&lt;br&gt;
introduces failure modes that traditional migration literature does not fully address.&lt;/p&gt;

&lt;p&gt;Through repeated architectural refinement, I formalized a reusable framework or pattern for safe, resumable, zero-downtime behavioral migration in&lt;br&gt;
distributed systems.&lt;/p&gt;

&lt;p&gt;This article outlines that framework.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Behavioral Migration Is Harder Than It Looks
&lt;/h2&gt;

&lt;p&gt;Behavioral migration differs from simple data movement in several ways important ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The system continues executing while migration runs&lt;/li&gt;
&lt;li&gt;  Partial activation can cause duplicate execution&lt;/li&gt;
&lt;li&gt;  Missing relationships can cause silent non-execution&lt;/li&gt;
&lt;li&gt;  Crashes must not require a full rollback&lt;/li&gt;
&lt;li&gt;  Re-running migration must be safe and deterministic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The risk is not visible downtime.&lt;/p&gt;

&lt;p&gt;The risk is inconsistent behavior.&lt;/p&gt;

&lt;p&gt;In high-availability systems, &lt;em&gt;"almost correct"&lt;/em&gt; is unacceptable.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Behavioral Migration Framework
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52n60atkxci3k7k4s2ih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52n60atkxci3k7k4s2ih.png" alt="Zero-Downtime Behavioral Migration Framework" width="800" height="1090"&gt;&lt;/a&gt;&lt;br&gt;
The framework is structured around five architectural principles.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Idempotent Step Isolation
&lt;/h2&gt;

&lt;p&gt;Migration should not be implemented as a monolithic script. Instead, it&lt;br&gt;
should be decomposed into deterministic, independently verifiable steps.&lt;/p&gt;

&lt;p&gt;Each step must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Detect prior completion&lt;/li&gt;
&lt;li&gt;  Cache its output&lt;/li&gt;
&lt;li&gt;  Skip safely if already executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6d03n2mbfx9ss46qle1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6d03n2mbfx9ss46qle1c.png" alt="Idempotent Step Isolation" width="800" height="597"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Safe restarts&lt;/li&gt;
&lt;li&gt;  Deterministic outcomes&lt;/li&gt;
&lt;li&gt;  Protection against duplicate writes&lt;/li&gt;
&lt;li&gt;  Operational resilience under failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without idempotent step isolation, migration reliability depends on&lt;br&gt;
process stability --- which is never guaranteed in distributed systems.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Atomic Activation Boundary
&lt;/h2&gt;

&lt;p&gt;One of the most dangerous migration mistakes is partial activation.&lt;/p&gt;

&lt;p&gt;If new entities are created and activated incrementally, the system may&lt;br&gt;
begin executing against an incomplete state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fkygv7acbuhb6vhhh9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fkygv7acbuhb6vhhh9t.png" alt="Atomic Activation Boundary" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The solution is strict separation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create all new entities in an inert state&lt;/li&gt;
&lt;li&gt; Establish all relationships&lt;/li&gt;
&lt;li&gt; Validate structural completeness&lt;/li&gt;
&lt;li&gt; Activate everything in one atomic boundary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This eliminates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Partial behavior shifts&lt;/li&gt;
&lt;li&gt;  Duplicate execution&lt;/li&gt;
&lt;li&gt;  Inconsistent state windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The activation boundary becomes the single, well-defined moment when&lt;br&gt;
execution transitions from legacy logic to the new model.&lt;/p&gt;

&lt;p&gt;In distributed environments, activation control is more important than&lt;br&gt;
creation logic.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. Deterministic Configuration Normalization
&lt;/h2&gt;

&lt;p&gt;Legacy systems accumulate structural redundancy. Equivalent&lt;br&gt;
configurations may exist under slightly different wrappers.&lt;/p&gt;

&lt;p&gt;Migration provides an opportunity to normalize equivalent logic without&lt;br&gt;
altering behavior.&lt;/p&gt;

&lt;p&gt;Using deterministic grouping keys such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ensures consistent consolidation.&lt;/p&gt;

&lt;p&gt;Normalization during migration produces a cleaner target model and&lt;br&gt;
reduces long-term technical debt. It transforms migration from&lt;br&gt;
replication into architectural refinement.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Bounded Concurrent Retrieval
&lt;/h2&gt;

&lt;p&gt;Behavioral migration frequently requires retrieving the configuration from&lt;br&gt;
distributed sources.&lt;/p&gt;

&lt;p&gt;Sequential retrieval is inefficient at scale.&lt;br&gt;
Unbounded concurrency risks overwhelming upstream systems.&lt;/p&gt;

&lt;p&gt;Bounded concurrency provides balance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;semaphore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When combined with exponential backoff retries, this approach maintains&lt;br&gt;
throughput while preserving system stability.&lt;/p&gt;

&lt;p&gt;Migration logic must scale without destabilizing the environment it is attempting to modernize.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Pre-Mutation Observability
&lt;/h2&gt;

&lt;p&gt;Before modifying the production state, a read-only analysis mode should&lt;br&gt;
exist.&lt;/p&gt;

&lt;p&gt;This mode should answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  What would be created?&lt;/li&gt;
&lt;li&gt;  What would be grouped?&lt;/li&gt;
&lt;li&gt;  What anomalies exist?&lt;/li&gt;
&lt;li&gt;  What would be skipped?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observation precedes mutation.&lt;/p&gt;

&lt;p&gt;Pre-mutation observability reduces uncertainty and surfaces structural&lt;br&gt;
inconsistencies before they become runtime failures.&lt;/p&gt;

&lt;p&gt;In complex distributed systems, analysis tooling is often more valuable&lt;br&gt;
than mutation tooling.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden Risk: Data Path Integrity
&lt;/h2&gt;

&lt;p&gt;Many migration failures are not caused by flawed algorithms.&lt;/p&gt;

&lt;p&gt;They are caused by incomplete data propagation.&lt;/p&gt;

&lt;p&gt;Conditional logic may be correct while upstream parsing silently fails, resulting in entire configuration segments being omitted.&lt;/p&gt;

&lt;p&gt;Therefore, validation must extend beyond:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Logical correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  End-to-end data path verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Integration-level validation is critical for behavioral migration&lt;br&gt;
safety.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Zero-downtime migration is not about moving data.&lt;/p&gt;

&lt;p&gt;It is about moving &lt;strong&gt;behavior&lt;/strong&gt; — without breaking &lt;strong&gt;operational guarantees&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Determinism&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Isolation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explicit transition boundaries&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Controlled execution&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability before change&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In high-availability systems, migration safety cannot be delegated to a deployment checklist.&lt;/p&gt;

&lt;p&gt;It must be embedded into the architecture itself.&lt;/p&gt;

&lt;p&gt;A migration should never be an ad-hoc script.&lt;/p&gt;

&lt;p&gt;It should be a designed workflow — predictable, resumable, and activation-safe — treated as a &lt;strong&gt;first-class architectural concern&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>systemdesign</category>
      <category>backend</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
