<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kumaravelu Saraboji Mahalingam</title>
    <description>The latest articles on Forem by Kumaravelu Saraboji Mahalingam (@databro).</description>
    <link>https://forem.com/databro</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3785347%2Fcae1e061-30b6-4861-b022-1806ce696941.png</url>
      <title>Forem: Kumaravelu Saraboji Mahalingam</title>
      <link>https://forem.com/databro</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/databro"/>
    <language>en</language>
    <item>
      <title>🚀 I Built a Browser-Local AI Assistant in Next.js with WebLLM, WASM, ONNX Runtime, Web Workers, and RAG</title>
      <dc:creator>Kumaravelu Saraboji Mahalingam</dc:creator>
      <pubDate>Tue, 14 Apr 2026 20:28:14 +0000</pubDate>
      <link>https://forem.com/databro/i-built-a-browser-local-ai-assistant-in-nextjs-with-webllm-wasm-onnx-runtime-web-workers-and-58b5</link>
      <guid>https://forem.com/databro/i-built-a-browser-local-ai-assistant-in-nextjs-with-webllm-wasm-onnx-runtime-web-workers-and-58b5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Most AI chat widgets are just frontends for a remote API.&lt;br&gt;&lt;br&gt;
This one is different.&lt;/p&gt;

&lt;p&gt;My assistant runs its core retrieval and generation pipeline &lt;strong&gt;inside the browser&lt;/strong&gt; using &lt;strong&gt;WebLLM&lt;/strong&gt;, &lt;strong&gt;Web Workers&lt;/strong&gt;, &lt;strong&gt;WASM&lt;/strong&gt;, &lt;strong&gt;ONNX Runtime Web&lt;/strong&gt;, and a &lt;strong&gt;local RAG architecture&lt;/strong&gt; built in &lt;strong&gt;Next.js&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can try it here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;👉 &lt;a href="https://databro.dev/?chat=open" rel="noopener noreferrer"&gt;https://databro.dev/?chat=open&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What makes this fun is not just that it works locally.&lt;br&gt;&lt;br&gt;
It is that the browser is doing real AI work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;loading model artifacts&lt;/li&gt;
&lt;li&gt;reusing browser cache&lt;/li&gt;
&lt;li&gt;embedding queries&lt;/li&gt;
&lt;li&gt;retrieving relevant chunks&lt;/li&gt;
&lt;li&gt;reranking candidates&lt;/li&gt;
&lt;li&gt;generating grounded answers&lt;/li&gt;
&lt;li&gt;returning data back to the UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That turns the browser from a thin client into an actual inference runtime.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Why I built it this way
&lt;/h2&gt;

&lt;p&gt;Most website assistants follow the same pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User enters a prompt.&lt;/li&gt;
&lt;li&gt;Frontend sends it to a backend.&lt;/li&gt;
&lt;li&gt;Backend calls an LLM API.&lt;/li&gt;
&lt;li&gt;Response comes back.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That works. But it also means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extra round trips&lt;/li&gt;
&lt;li&gt;recurring inference cost&lt;/li&gt;
&lt;li&gt;more infrastructure&lt;/li&gt;
&lt;li&gt;more privacy tradeoffs&lt;/li&gt;
&lt;li&gt;less control over local behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a different architecture: a &lt;strong&gt;browser-local AI assistant&lt;/strong&gt; that could answer from a curated knowledge base without depending on server-side inference for the main path.&lt;/p&gt;

&lt;p&gt;That is where &lt;strong&gt;WebLLM&lt;/strong&gt;, &lt;strong&gt;Web Workers&lt;/strong&gt;, &lt;strong&gt;WASM&lt;/strong&gt;, &lt;strong&gt;ONNX Runtime Web&lt;/strong&gt;, and &lt;strong&gt;RAG&lt;/strong&gt; start fitting together really well.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 The core idea
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;This is not “an LLM in a webpage.”&lt;/p&gt;

&lt;p&gt;It is a layered browser-native AI system where each part has a very specific role:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js widget&lt;/strong&gt; → chat UI and state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Worker&lt;/strong&gt; → orchestration and background execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebLLM&lt;/strong&gt; → local generation runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WASM&lt;/strong&gt; → efficient low-level browser execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime Web&lt;/strong&gt; → browser inference for embedding and reranking tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG pipeline&lt;/strong&gt; → grounding answers against the knowledge base&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching&lt;/strong&gt; → making repeat sessions practical&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once I started thinking about the architecture this way, the implementation became much cleaner.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔥 What WebLLM actually is
&lt;/h2&gt;

&lt;p&gt;A lot of people hear “WebLLM” and assume it is the model.&lt;/p&gt;

&lt;p&gt;It is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebLLM is the browser-side runtime used to load and execute supported language models locally.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means the model and the runtime are two different things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WebLLM&lt;/strong&gt; = execution engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama / Phi / Gemma / Mistral&lt;/strong&gt; = model loaded into that engine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction matters because it changes how you think about browser inference.&lt;/p&gt;

&lt;p&gt;You are not calling a model API.&lt;br&gt;&lt;br&gt;
You are creating a local runtime, loading a model into it, and then sending prompt messages into that runtime.&lt;/p&gt;

&lt;p&gt;That framing made a huge difference for me.&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 Does WebLLM need to be downloaded?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Yes — and this is one of the most important practical details in browser-local AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On first use, the browser usually needs to download:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the selected model artifacts&lt;/li&gt;
&lt;li&gt;runtime support assets&lt;/li&gt;
&lt;li&gt;related files required to initialize inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means browser-local AI comes with a real &lt;strong&gt;first-run cost&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But after that, things get better fast.&lt;/p&gt;

&lt;p&gt;Once those assets are cached, later sessions are much faster. This is one of the biggest UX wins in local inference: the browser starts behaving more like an application runtime than a stateless page.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ How the WebLLM runtime gets created
&lt;/h2&gt;

&lt;p&gt;At a high level, the runtime lifecycle looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create the WebLLM engine.&lt;/li&gt;
&lt;li&gt;Select a supported model.&lt;/li&gt;
&lt;li&gt;Download artifacts if they are not already cached.&lt;/li&gt;
&lt;li&gt;Load the model into the engine.&lt;/li&gt;
&lt;li&gt;Send structured prompt messages for generation.&lt;/li&gt;
&lt;li&gt;Return the generated output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So the runtime is not just a helper utility.&lt;br&gt;&lt;br&gt;
It is the execution container for the model.&lt;/p&gt;

&lt;p&gt;That is why I think of WebLLM as a &lt;strong&gt;browser-native inference runtime&lt;/strong&gt; rather than a simple wrapper library.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧱 Why WASM matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;WASM (WebAssembly)&lt;/strong&gt; is one of the hidden pillars of browser-local AI.&lt;/p&gt;

&lt;p&gt;A lot of browser AI articles mention it in passing, but it deserves more attention than that.&lt;/p&gt;

&lt;p&gt;WASM gives the browser a compact, efficient way to execute compute-heavy logic closer to native speed than ordinary JavaScript. That matters because local inference is not light work.&lt;/p&gt;

&lt;p&gt;Tasks like these become much more realistic with performant browser execution paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model runtime support&lt;/li&gt;
&lt;li&gt;tensor-heavy execution&lt;/li&gt;
&lt;li&gt;embedding pipelines&lt;/li&gt;
&lt;li&gt;reranking workloads&lt;/li&gt;
&lt;li&gt;token generation infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without efficient lower-level execution, the entire local inference stack becomes much harder to make practical.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 WebLLM vs WASM vs ONNX Runtime Web
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;These are related, but they are not the same thing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A simple way to separate them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WebLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local runtime for browser-based LLM generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WASM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Efficient low-level execution layer in the browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ONNX Runtime Web&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Browser inference runtime for ONNX-backed model workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Web Worker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Background execution boundary that protects UI responsiveness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So WASM is not competing with WebLLM.&lt;br&gt;&lt;br&gt;
It is one of the technologies helping make browser-native inference feasible.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧮 What ONNX Runtime Web is doing here
&lt;/h2&gt;

&lt;p&gt;One of the easiest mistakes in local AI architecture is treating every model task as if it belongs to the same runtime.&lt;/p&gt;

&lt;p&gt;It does not.&lt;/p&gt;

&lt;p&gt;Generation is one kind of workload.&lt;br&gt;&lt;br&gt;
Embedding and reranking are different workloads.&lt;/p&gt;

&lt;p&gt;That is why I like this split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WebLLM&lt;/strong&gt; for generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime Web&lt;/strong&gt; for retrieval-side transformer execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a strong design because retrieval-side tasks often need a different execution path than token-by-token LLM generation.&lt;/p&gt;

&lt;p&gt;In practice, browser-local RAG is rarely “one model doing everything.”&lt;br&gt;&lt;br&gt;
It is a pipeline of specialized responsibilities.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧵 Why Web Workers are non-negotiable
&lt;/h2&gt;

&lt;p&gt;If you run browser-local AI on the main thread, the UI will eventually remind you that this was a bad idea.&lt;/p&gt;

&lt;p&gt;Maybe not immediately.&lt;br&gt;&lt;br&gt;
Maybe not on your development machine.&lt;/p&gt;

&lt;p&gt;But once model loading, chunk scoring, reranking, and generation pile up, the experience starts to degrade fast.&lt;/p&gt;

&lt;p&gt;That is why &lt;strong&gt;Web Workers&lt;/strong&gt; are essential.&lt;/p&gt;

&lt;p&gt;A worker gives you a separate execution context for heavy tasks so the main thread can stay focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rendering&lt;/li&gt;
&lt;li&gt;input handling&lt;/li&gt;
&lt;li&gt;scrolling and interaction&lt;/li&gt;
&lt;li&gt;animation&lt;/li&gt;
&lt;li&gt;state updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For AI-heavy browser apps, that separation is not a nice-to-have.&lt;br&gt;&lt;br&gt;
It is architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚛️ Creating a Web Worker in Next.js
&lt;/h2&gt;

&lt;p&gt;Workers are browser APIs, so they should only be created on the client side.&lt;/p&gt;

&lt;p&gt;That means your widget component should be client-rendered and the worker should be created lazily when the chat experience actually begins.&lt;/p&gt;

&lt;p&gt;This pattern works especially well in Next.js because it lets you keep rendering concerns in the UI layer while moving heavy orchestration into a background execution boundary.&lt;/p&gt;

&lt;p&gt;I also prefer lazy worker creation because it avoids paying the initialization cost for users who never open the assistant.&lt;/p&gt;




&lt;h2&gt;
  
  
  📨 Widget-to-worker messaging
&lt;/h2&gt;

&lt;p&gt;Once the worker exists, the widget should communicate with it using structured messages rather than trying to share runtime state directly.&lt;/p&gt;

&lt;p&gt;That message boundary matters a lot.&lt;/p&gt;

&lt;p&gt;The UI sends things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt text&lt;/li&gt;
&lt;li&gt;serialized KB context&lt;/li&gt;
&lt;li&gt;request identifier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The worker sends back:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;final answer&lt;/li&gt;
&lt;li&gt;citation metadata&lt;/li&gt;
&lt;li&gt;failure state if something breaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation keeps the frontend simpler and makes the worker a clean orchestration boundary instead of an implementation detail leaking into the UI.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Worker orchestration is where the real engineering happens
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The worker is not just a background script.&lt;/p&gt;

&lt;p&gt;It is the orchestration layer of the entire local AI lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where things become much more than “I loaded a model in the browser.”&lt;/p&gt;

&lt;p&gt;The worker is responsible for coordinating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model/runtime initialization&lt;/li&gt;
&lt;li&gt;cache-aware model reuse&lt;/li&gt;
&lt;li&gt;KB artifact loading&lt;/li&gt;
&lt;li&gt;embedding availability&lt;/li&gt;
&lt;li&gt;retrieval and score fusion&lt;/li&gt;
&lt;li&gt;reranking&lt;/li&gt;
&lt;li&gt;confidence checks&lt;/li&gt;
&lt;li&gt;prompt assembly&lt;/li&gt;
&lt;li&gt;answer generation&lt;/li&gt;
&lt;li&gt;packaging result metadata back to the widget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That orchestration layer is what transforms separate browser AI technologies into an actual product.&lt;/p&gt;

&lt;p&gt;This, honestly, is where most of the engineering value lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ What the worker is really managing
&lt;/h2&gt;

&lt;p&gt;The worker owns the lifecycle of the expensive parts of the system.&lt;/p&gt;

&lt;p&gt;That typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the generation engine&lt;/li&gt;
&lt;li&gt;embedding model access&lt;/li&gt;
&lt;li&gt;reranker access&lt;/li&gt;
&lt;li&gt;parsed or cached KB context&lt;/li&gt;
&lt;li&gt;warm in-memory session state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is important because the main thread should not be responsible for managing heavy AI runtime objects.&lt;/p&gt;

&lt;p&gt;The UI should care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input&lt;/li&gt;
&lt;li&gt;loading states&lt;/li&gt;
&lt;li&gt;response rendering&lt;/li&gt;
&lt;li&gt;citations&lt;/li&gt;
&lt;li&gt;interaction flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The worker should care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;initialization&lt;/li&gt;
&lt;li&gt;orchestration&lt;/li&gt;
&lt;li&gt;reuse&lt;/li&gt;
&lt;li&gt;inference sequencing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation is one of the biggest reasons the app feels stable instead of fragile. &lt;/p&gt;




&lt;h2&gt;
  
  
  🗂️ Overall architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh9u4mnu438fgdx0agf19.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh9u4mnu438fgdx0agf19.png" alt="browser ai chat" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 Prompt lifecycle
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedbjuuznzz8066urt2h9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedbjuuznzz8066urt2h9.png" alt="rag query pipeline" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s walk the whole journey.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) The user opens the widget
&lt;/h3&gt;

&lt;p&gt;The Next.js app renders the chat interface and waits for interaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) The worker is created lazily
&lt;/h3&gt;

&lt;p&gt;Only when the user opens or uses the assistant does the app create the worker.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) The worker warms up the AI stack
&lt;/h3&gt;

&lt;p&gt;It checks whether engines, pipelines, and context state already exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Browser cache is consulted
&lt;/h3&gt;

&lt;p&gt;If model assets are already cached, startup is faster.&lt;br&gt;&lt;br&gt;
If not, the first-run downloads happen here.&lt;/p&gt;

&lt;h3&gt;
  
  
  5) KB vectors are loaded
&lt;/h3&gt;

&lt;p&gt;The worker loads precomputed vector artifacts or rebuilds what it needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  6) The user enters a prompt
&lt;/h3&gt;

&lt;p&gt;The widget sends the prompt and context payload to the worker.&lt;/p&gt;

&lt;h3&gt;
  
  
  7) The embedding model encodes the query
&lt;/h3&gt;

&lt;p&gt;The prompt is turned into a dense vector representation.&lt;/p&gt;

&lt;h3&gt;
  
  
  8) Retrieval begins
&lt;/h3&gt;

&lt;p&gt;Dense retrieval and sparse retrieval identify candidate KB chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  9) Hybrid scoring narrows the pool
&lt;/h3&gt;

&lt;p&gt;The system fuses semantic and lexical signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  10) Reranking refines the candidates
&lt;/h3&gt;

&lt;p&gt;The best chunks are rescored for prompt-specific usefulness.&lt;/p&gt;

&lt;h3&gt;
  
  
  11) Confidence gating runs
&lt;/h3&gt;

&lt;p&gt;If the candidates are weak, the system can fall back safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  12) Grounded context is assembled
&lt;/h3&gt;

&lt;p&gt;The final chunk set is turned into the context window for generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  13) WebLLM executes generation
&lt;/h3&gt;

&lt;p&gt;The worker sends system rules, prompt, and grounded context into the local runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  14) The worker packages the result
&lt;/h3&gt;

&lt;p&gt;The answer text and citation metadata are returned to the widget.&lt;/p&gt;

&lt;h3&gt;
  
  
  15) The UI renders the final response
&lt;/h3&gt;

&lt;p&gt;The user receives a grounded answer without needing the main inference path to leave the browser.&lt;/p&gt;

&lt;p&gt;That full lifecycle is what turns a browser-local model into a practical assistant.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ What I learned building this
&lt;/h2&gt;

&lt;p&gt;Here is the short version of what mattered most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Treat WebLLM as a runtime, not as “the model.”&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expect first-run downloads and design for cache reuse.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep heavy work off the main thread.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use workers as orchestration boundaries, not just compute bins.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Separate generation from retrieval-side inference.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Precompute KB vectors whenever possible.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use reranking if grounded quality matters.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Add confidence gates before you need them.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Design for warm sessions, not just cold starts.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the choices that made the assistant feel like a product instead of a demo.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎉 Final thought
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The most exciting part of this project is not that one library made it possible.&lt;/p&gt;

&lt;p&gt;It is that several browser-native technologies now fit together well enough to build a real local AI product.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That stack, for me, looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WebLLM&lt;/strong&gt; for generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WASM&lt;/strong&gt; for efficient browser execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime Web&lt;/strong&gt; for embedding and reranking paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Workers&lt;/strong&gt; for orchestration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG&lt;/strong&gt; for grounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Put together, they turn the browser into something much more powerful than a UI shell.&lt;/p&gt;

&lt;p&gt;They turn it into the runtime.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔗 Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://databro.dev/?chat=open" rel="noopener noreferrer"&gt;https://databro.dev/?chat=open&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webllm</category>
      <category>webassembly</category>
      <category>onnx</category>
    </item>
    <item>
      <title>Apache Parquet File Anatomy: Row Groups, Column Chunks, Pages, and Metadata Explained 🧱📦</title>
      <dc:creator>Kumaravelu Saraboji Mahalingam</dc:creator>
      <pubDate>Fri, 10 Apr 2026 17:41:10 +0000</pubDate>
      <link>https://forem.com/databro/apache-parquet-file-anatomy-row-groups-column-chunks-pages-and-metadata-explained-4ebg</link>
      <guid>https://forem.com/databro/apache-parquet-file-anatomy-row-groups-column-chunks-pages-and-metadata-explained-4ebg</guid>
      <description>&lt;p&gt;If you use Spark, Athena, Iceberg, Snowflake, DuckDB, or Pandas, you’ve probably worked with Parquet hundreds of times. But most of us first learn Parquet as a simple rule of thumb: &lt;strong&gt;it’s columnar, compressed, and great for analytics&lt;/strong&gt;. That’s true, but it leaves out the most interesting part — &lt;strong&gt;why&lt;/strong&gt; Parquet performs so well in the first place.&lt;/p&gt;

&lt;p&gt;Under the hood, a Parquet file is not just a blob of compressed data. It has a deliberate internal structure made of &lt;strong&gt;row groups, column chunks, pages, and footer metadata&lt;/strong&gt;, and that structure is exactly what enables column pruning, predicate pushdown, and efficient scans in modern query engines.&lt;/p&gt;

&lt;p&gt;In this post, we’ll break down the anatomy of a Parquet file from the file boundary all the way down to individual pages, and then connect those pieces back to the real-world performance behavior you see in Spark, Iceberg, and Athena.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Parquet matters ⚡
&lt;/h2&gt;

&lt;p&gt;Most analytical queries do not read every column and every row. They usually select a subset of columns, filter by a few predicates, and aggregate over large volumes of data. Parquet is designed specifically for that style of access, which is why it outperforms row-oriented formats like CSV for analytics-heavy workloads.&lt;/p&gt;

&lt;p&gt;Instead of storing each record end-to-end, Parquet stores data &lt;strong&gt;column by column&lt;/strong&gt;, while still grouping rows into larger units for efficient processing. That combination improves compression, reduces unnecessary I/O, and allows engines to skip chunks of data using metadata rather than brute-force scanning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the big picture 🗂️
&lt;/h2&gt;

&lt;p&gt;The easiest way to understand Parquet is to think of it as a hierarchy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;file&lt;/strong&gt; contains one or more &lt;strong&gt;row groups&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Each &lt;strong&gt;row group&lt;/strong&gt; contains one &lt;strong&gt;column chunk&lt;/strong&gt; per column.&lt;/li&gt;
&lt;li&gt;Each &lt;strong&gt;column chunk&lt;/strong&gt; contains one or more &lt;strong&gt;pages&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The file ends with a &lt;strong&gt;footer&lt;/strong&gt; that stores schema and metadata about those structures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That may sound abstract at first, so here is the mental model I use: a Parquet file is like a mini warehouse 🏭, where rows are divided into sections, each section stores columns separately, and the catalog for the whole warehouse sits at the very end of the file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The physical file layout 🧩
&lt;/h2&gt;

&lt;p&gt;At the physical level, a Parquet file starts with a magic marker, stores row-group data in the body, and ends with footer metadata, the footer length, and another magic marker. Apache Parquet documents this structure explicitly with &lt;code&gt;PAR1&lt;/code&gt; at both the beginning and the end of the file.&lt;/p&gt;

&lt;p&gt;Here is the high-level layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[PAR1][Row Group Data ...][File Metadata][Metadata Length][PAR1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That footer-at-the-end design is more important than it looks. A reader can jump to the end of the file, inspect the metadata, understand the schema and row groups, and plan an efficient read before touching most of the actual data blocks.&lt;/p&gt;

&lt;h2&gt;
  
  
  A file-level diagram 🏗️
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7uu1owjlir3icyamxuu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7uu1owjlir3icyamxuu.png" alt="parquet file structure" width="800" height="795"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the skeleton of every Parquet file: data first, metadata last.&lt;/p&gt;

&lt;h2&gt;
  
  
  Row groups: the first major building block 🧱
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;row group&lt;/strong&gt; is a horizontal partition of rows inside a single Parquet file. If a file contains one million rows, those rows may be split across multiple row groups, and each row group becomes a self-contained unit for reading and processing.&lt;/p&gt;

&lt;p&gt;This matters because row groups are a natural unit for parallelism. Distributed engines can assign different row groups to different tasks, and metadata associated with each row group can help decide whether that row group needs to be read at all.&lt;/p&gt;

&lt;p&gt;You can think of it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Parquet File
├── Row Group 1 -&amp;gt; rows 1 to 250,000
├── Row Group 2 -&amp;gt; rows 250,001 to 500,000
├── Row Group 3 -&amp;gt; rows 500,001 to 750,000
└── Row Group 4 -&amp;gt; rows 750,001 to 1,000,000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important nuance is that a row group is not stored row-by-row internally. It is still columnar inside, which is where column chunks come in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Column chunks: where columnar storage shows up 🧵
&lt;/h2&gt;

&lt;p&gt;Inside each row group, every column gets its own &lt;strong&gt;column chunk&lt;/strong&gt;. That means for a row group containing &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;country&lt;/code&gt;, and &lt;code&gt;amount&lt;/code&gt;, Parquet stores one chunk for &lt;code&gt;id&lt;/code&gt;, one for &lt;code&gt;country&lt;/code&gt;, and one for &lt;code&gt;amount&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the mechanism behind &lt;strong&gt;column pruning&lt;/strong&gt;. If your query only needs &lt;code&gt;country&lt;/code&gt; and &lt;code&gt;amount&lt;/code&gt;, the engine can skip the &lt;code&gt;id&lt;/code&gt; chunks entirely, which reduces both I/O and deserialization work.&lt;/p&gt;

&lt;p&gt;Here is a simple view:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc2zm6wvxwzigmv0n925.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc2zm6wvxwzigmv0n925.png" alt="row group columns" width="800" height="580"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, you can already see why Parquet is so effective for analytics. Analytical queries rarely need every field in every row, and Parquet’s internal structure mirrors that reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pages: the smallest units inside a chunk 📄
&lt;/h2&gt;

&lt;p&gt;Each column chunk is further divided into &lt;strong&gt;pages&lt;/strong&gt;, which are the smallest units used to store encoded data. Pages hold the actual values, and depending on the encoding being used, they may also be preceded by a dictionary page.&lt;/p&gt;

&lt;p&gt;That means a column chunk is not one monolithic blob. It is a sequence of smaller blocks that can be encoded and compressed efficiently, while still fitting the overall columnar structure.&lt;/p&gt;

&lt;p&gt;A useful diagram looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72uakq1k1k78vxpeiare.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72uakq1k1k78vxpeiare.png" alt="column chunk structure" width="800" height="1747"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In practice, this page-level organization helps Parquet balance storage efficiency with read efficiency. The format can encode and compress data in manageable units instead of treating each column chunk as a single continuous stream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dictionary pages and encoding 📚
&lt;/h2&gt;

&lt;p&gt;One of the most common Parquet optimizations is &lt;strong&gt;dictionary encoding&lt;/strong&gt;. Instead of writing repeated string values over and over, Parquet can write a dictionary of unique values once and then store compact references in the data pages.&lt;/p&gt;

&lt;p&gt;For a column like &lt;code&gt;country&lt;/code&gt;, the dictionary might contain &lt;code&gt;US&lt;/code&gt;, &lt;code&gt;IN&lt;/code&gt;, and &lt;code&gt;CA&lt;/code&gt;, and the data pages would store something closer to &lt;code&gt;0, 0, 1, 2&lt;/code&gt; than full repeated strings. That reduces storage size and often improves downstream compression too.&lt;/p&gt;

&lt;p&gt;This is one reason categorical columns often compress especially well in Parquet. Repeated patterns are easier to encode when similar values are physically grouped together in the same column chunk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The footer: the real brain of the file 🧠
&lt;/h2&gt;

&lt;p&gt;The most important part of a Parquet file is arguably not the data body but the &lt;strong&gt;footer&lt;/strong&gt;. That footer stores file metadata such as the schema, row-group descriptions, and column-level information needed by readers to interpret the file efficiently.&lt;/p&gt;

&lt;p&gt;Because the footer is written at the end of the file, readers can retrieve it first, inspect the contents, and decide what to read and what to skip. That is a huge part of why Parquet feels smart rather than brute-force.&lt;/p&gt;

&lt;p&gt;At a high level, the footer can tell a reader:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the schema is.&lt;/li&gt;
&lt;li&gt;How many row groups exist.&lt;/li&gt;
&lt;li&gt;Where each column chunk lives in the file.&lt;/li&gt;
&lt;li&gt;What encodings and compression settings were used.&lt;/li&gt;
&lt;li&gt;What statistics are available for pruning.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Metadata is what powers skipping 🚦
&lt;/h2&gt;

&lt;p&gt;Parquet’s metadata is not just descriptive. It is actionable. The row-group and column metadata often includes statistics such as minimum value, maximum value, and null count, which allows query engines to avoid reading irrelevant data.&lt;/p&gt;

&lt;p&gt;For example, if a row group’s &lt;code&gt;event_date&lt;/code&gt; has a minimum of &lt;code&gt;2026-01-01&lt;/code&gt; and a maximum of &lt;code&gt;2026-01-31&lt;/code&gt;, then a query filtering for March 2026 can skip that row group entirely. The engine does not need to inspect every row to know there is no match.&lt;/p&gt;

&lt;p&gt;That optimization is the foundation of &lt;strong&gt;predicate pushdown&lt;/strong&gt; and &lt;strong&gt;row-group pruning&lt;/strong&gt;. Instead of reading first and filtering later, engines can use metadata to avoid unnecessary reads in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Predicate pushdown diagram 🎯
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrik3q2ep1xsx6u3whd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrik3q2ep1xsx6u3whd8.png" alt="row group pruning" width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is one of the most important performance ideas in Parquet. The file is designed so engines can make good decisions before scanning the full payload.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example 🧪
&lt;/h2&gt;

&lt;p&gt;Let’s say you have this table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;country&lt;/th&gt;
&lt;th&gt;amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;US&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;US&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;IN&lt;/td&gt;
&lt;td&gt;900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;CA&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In a row-based file, those values are stored as complete records one after another. In Parquet, the values are stored by column inside each row group, so the &lt;code&gt;country&lt;/code&gt; values sit together and the &lt;code&gt;amount&lt;/code&gt; values sit together rather than being interleaved row-by-row.&lt;/p&gt;

&lt;p&gt;Now imagine this query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A Parquet-aware engine can use metadata to identify which row groups might contain &lt;code&gt;amount &amp;gt; 500&lt;/code&gt;, read the relevant &lt;code&gt;amount&lt;/code&gt; column chunks for filtering, and then read only the &lt;code&gt;country&lt;/code&gt; column for matching records. It does not need to read every column for every row the way a plain text row format typically would.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why compression works so well 🗜️
&lt;/h2&gt;

&lt;p&gt;Parquet’s storage efficiency comes from a combination of &lt;strong&gt;columnar layout&lt;/strong&gt;, &lt;strong&gt;encoding&lt;/strong&gt;, and &lt;strong&gt;compression&lt;/strong&gt;. Similar values tend to sit next to each other within a column, which usually makes them more compressible than mixed-value row-based storage.&lt;/p&gt;

&lt;p&gt;For example, a &lt;code&gt;status&lt;/code&gt; column containing repeated values like &lt;code&gt;SUCCESS&lt;/code&gt;, &lt;code&gt;SUCCESS&lt;/code&gt;, &lt;code&gt;FAILED&lt;/code&gt;, &lt;code&gt;SUCCESS&lt;/code&gt; is far easier to encode compactly when those values are grouped together than when they are scattered across full records containing timestamps, IDs, and free-form text.&lt;/p&gt;

&lt;p&gt;That is why Parquet often ends up dramatically smaller than CSV while also being faster to scan for analytical use cases. Its internal organization works with compression instead of fighting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why row group size is a tuning lever 🎛️
&lt;/h2&gt;

&lt;p&gt;Row groups are not just a format detail. They are also a performance tuning lever. Larger row groups often improve compression and reduce metadata overhead, but they can reduce pruning granularity. Smaller row groups allow finer skipping and often more parallelism, but they introduce more metadata and may hurt compression efficiency.&lt;/p&gt;

&lt;p&gt;This is one of the reasons output file design matters so much in distributed data systems. A well-formed Parquet file is not just about “using Parquet” — it is also about choosing file sizes and row-group sizing that match your workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means in Spark 🔥
&lt;/h2&gt;

&lt;p&gt;In Spark, Parquet’s layout maps naturally to common optimizations like column pruning and predicate pushdown. When Spark can use Parquet statistics effectively, it avoids reading unnecessary row groups and often avoids materializing columns that are not selected by the query.&lt;/p&gt;

&lt;p&gt;That means your file layout choices affect real job behavior. If your data is written into too many small files or poorly sized row groups, you may lose many of the benefits that Parquet is structurally capable of delivering.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means in Iceberg 🧊
&lt;/h2&gt;

&lt;p&gt;Iceberg relies heavily on Parquet because Parquet already provides efficient columnar storage and file-level metadata patterns that work well for analytical reads. Iceberg adds another planning layer on top, but the scan efficiency still depends a lot on the properties of the underlying Parquet files.&lt;/p&gt;

&lt;p&gt;In other words, Iceberg gives you table-level intelligence, but Parquet still does much of the physical storage work. Understanding row groups and statistics helps explain why good file compaction and sort strategy can matter so much in Iceberg-backed tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means in Athena 🏛️
&lt;/h2&gt;

&lt;p&gt;Athena benefits from Parquet for the same core reasons: fewer bytes scanned, better compression, and the ability to skip irrelevant data using metadata and layout-aware reads. Since Athena pricing and performance are tightly tied to scanned data volume, Parquet’s structure can directly reduce both runtime and cost.&lt;/p&gt;

&lt;p&gt;That is why converting CSV-based data lakes into partitioned and well-written Parquet often delivers an immediate practical benefit. The file format itself changes how much work the engine has to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  A common misconception 🚫
&lt;/h2&gt;

&lt;p&gt;A common misconception is that Parquet is just “a binary CSV with compression.” That is not really what it is. Parquet is a structured columnar storage format with typed schema metadata, row groups, column chunks, pages, and statistics-aware footers that analytical engines can exploit directly.&lt;/p&gt;

&lt;p&gt;CSV is a simple row-based serialization format. Parquet is a storage format engineered for selective analytical access. Those are fundamentally different design goals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final mental model 🧠
&lt;/h2&gt;

&lt;p&gt;If you only remember one thing, remember this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row groups&lt;/strong&gt; partition rows into larger processing units.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column chunks&lt;/strong&gt; store one column’s data inside each row group.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pages&lt;/strong&gt; break column chunks into smaller encoded blocks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Footer metadata&lt;/strong&gt; tells engines what exists, where it lives, and what can be skipped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once that clicks, a lot of data engineering advice becomes easier to reason about. File sizing, pruning, partitioning, compaction, and scan performance all tie back to this physical layout.&lt;/p&gt;

&lt;p&gt;If you work in Spark, Iceberg, Athena, or any modern analytical stack, understanding Parquet internals is one of those low-level concepts that pays off repeatedly. The format is doing much more than simply storing data — it is shaping how your engine thinks about reading it.&lt;/p&gt;

&lt;p&gt;👉 Want to inspect this visually? Try it here: &lt;a href="https://databro.dev/tools/parquet-inspector-plus/" rel="noopener noreferrer"&gt;https://databro.dev/tools/parquet-inspector-plus/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>apacheparquet</category>
      <category>iceberg</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Data Engineering Meets DuckDB</title>
      <dc:creator>Kumaravelu Saraboji Mahalingam</dc:creator>
      <pubDate>Sun, 01 Mar 2026 16:54:19 +0000</pubDate>
      <link>https://forem.com/databro/data-engineering-meets-duckdb-dcd</link>
      <guid>https://forem.com/databro/data-engineering-meets-duckdb-dcd</guid>
      <description>&lt;h3&gt;
  
  
  Introduction to Data Engineering and DuckDB
&lt;/h3&gt;

&lt;p&gt;Data engineering is a crucial aspect of the data science ecosystem, focusing on the design, construction, and maintenance of data pipelines and architectures. As data engineers, we strive to create efficient, scalable, and reliable systems that can handle the ever-increasing volumes of data. In this article, we will explore the concept of data engineering and introduce DuckDB, an innovative database management system that is revolutionizing the way we work with data.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Data Engineering?
&lt;/h3&gt;

&lt;p&gt;Data engineering is a field that combines software engineering and data science to design, build, and maintain large-scale data systems. Data engineers are responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Designing and implementing data pipelines&lt;/li&gt;
&lt;li&gt;Developing and maintaining data architectures&lt;/li&gt;
&lt;li&gt;Ensuring data quality and integrity&lt;/li&gt;
&lt;li&gt;Optimizing data storage and retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data engineering involves a range of activities, from data ingestion and processing to data storage and analysis. It requires a deep understanding of data formats, data structures, and data processing algorithms, as well as expertise in programming languages such as Python, Java, and Scala.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges in Data Engineering
&lt;/h3&gt;

&lt;p&gt;Data engineers face numerous challenges, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Handling large volumes of data and ensuring that systems can scale to meet increasing demands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Optimizing data processing and retrieval to minimize latency and maximize throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Quality&lt;/strong&gt;: Ensuring that data is accurate, complete, and consistent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Protecting sensitive data from unauthorized access and ensuring compliance with regulatory requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Introducing DuckDB
&lt;/h3&gt;

&lt;p&gt;DuckDB is an open-source, columnar database management system that is designed to address the challenges of data engineering. It is a relational database that allows for efficient storage and querying of large datasets. DuckDB is written in C++ and provides a SQL interface for interacting with data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features of DuckDB
&lt;/h3&gt;

&lt;p&gt;Some of the key features of DuckDB include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Columnar Storage&lt;/strong&gt;: DuckDB stores data in a columnar format, which allows for efficient compression and querying of data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-Memory Processing&lt;/strong&gt;: DuckDB can process data in-memory, which reduces the need for disk I/O and improves performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL Interface&lt;/strong&gt;: DuckDB provides a SQL interface for interacting with data, making it easy to integrate with existing data pipelines and tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support for Advanced Data Types&lt;/strong&gt;: DuckDB supports advanced data types such as arrays, structs, and geospatial data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits of Using DuckDB
&lt;/h3&gt;

&lt;p&gt;The benefits of using DuckDB include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Improved Performance&lt;/strong&gt;: DuckDB's columnar storage and in-memory processing capabilities make it ideal for real-time analytics and data science applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified Data Engineering&lt;/strong&gt;: DuckDB's SQL interface and support for advanced data types make it easy to integrate with existing data pipelines and tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-Effective&lt;/strong&gt;: DuckDB is open-source and can run on commodity hardware, making it a cost-effective alternative to traditional database management systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use Cases for DuckDB
&lt;/h3&gt;

&lt;p&gt;DuckDB is suitable for a range of use cases, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Analytics&lt;/strong&gt;: DuckDB's in-memory processing capabilities make it ideal for real-time analytics and data science applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Warehousing&lt;/strong&gt;: DuckDB's columnar storage and SQL interface make it suitable for data warehousing and business intelligence applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IoT Data Processing&lt;/strong&gt;: DuckDB's support for advanced data types and in-memory processing capabilities make it suitable for IoT data processing and analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;In conclusion, data engineering is a critical aspect of the data science ecosystem, and DuckDB is an innovative database management system that can help address the challenges of data engineering. With its columnar storage, in-memory processing, and SQL interface, DuckDB is an ideal solution for real-time analytics, data warehousing, and IoT data processing. As data engineers, we should consider DuckDB as a key component of our data architectures and explore its capabilities to improve the efficiency, scalability, and reliability of our data pipelines.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>duckdb</category>
      <category>databasemanagement</category>
      <category>datascience</category>
    </item>
    <item>
      <title>RAG?</title>
      <dc:creator>Kumaravelu Saraboji Mahalingam</dc:creator>
      <pubDate>Sun, 01 Mar 2026 14:07:57 +0000</pubDate>
      <link>https://forem.com/databro/revolutionizing-genai-with-rag-1pag</link>
      <guid>https://forem.com/databro/revolutionizing-genai-with-rag-1pag</guid>
      <description>&lt;h3&gt;
  
  
  Introduction to RAG in GenAI
&lt;/h3&gt;

&lt;p&gt;As Data Engineers, we're constantly exploring innovative technologies to improve our workflows and models. One such concept that has gained significant attention in the realm of General Artificial Intelligence (GenAI) is Retrieval-Augmented Generation (RAG). In this article, we'll delve into the world of RAG, its components, and how it's revolutionizing the field of GenAI.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Retrieval-Augmented Generation (RAG)?
&lt;/h3&gt;

&lt;p&gt;RAG is a paradigm that combines the strengths of retrieval-based and generation-based approaches to produce more accurate, informative, and context-specific outputs. It's particularly useful in applications where the model needs to generate human-like text based on a given prompt or input.&lt;/p&gt;

&lt;p&gt;The RAG framework consists of three primary components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retriever&lt;/strong&gt;: This module is responsible for retrieving relevant information from a vast knowledge base or database. The retriever uses the input prompt to search for related documents, passages, or data points that can aid in the generation process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generator&lt;/strong&gt;: Once the retriever has fetched the relevant information, the generator takes over. This module uses the retrieved data to generate the final output, which can be text, images, or any other form of media.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ranker&lt;/strong&gt;: The ranker is an optional component that evaluates the generated outputs and ranks them based on their relevance, accuracy, and overall quality.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How RAG Works
&lt;/h3&gt;

&lt;p&gt;The RAG pipeline can be broken down into the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: The user provides a prompt or input that serves as the basis for the generation process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: The retriever searches the knowledge base to gather relevant information related to the input prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: The generator uses the retrieved information to produce one or more candidate outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ranking&lt;/strong&gt;: If a ranker is present, it evaluates the generated outputs and assigns a score to each one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: The final output is selected based on the ranking scores or other evaluation metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits of RAG
&lt;/h3&gt;

&lt;p&gt;The RAG framework offers several advantages over traditional generation-based approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Improved accuracy&lt;/strong&gt;: By leveraging the retriever to fetch relevant information, RAG models can produce more accurate and informative outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increased contextuality&lt;/strong&gt;: RAG allows models to consider a broader context when generating outputs, leading to more coherent and relevant responses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced hallucination&lt;/strong&gt;: The retriever's ability to fetch real-world data helps reduce the likelihood of hallucination, where models generate outputs that are not grounded in reality.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Applications of RAG
&lt;/h3&gt;

&lt;p&gt;RAG has numerous applications in areas such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chatbots and conversational AI&lt;/strong&gt;: RAG can be used to generate more informative and context-specific responses to user queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text summarization&lt;/strong&gt;: RAG models can summarize long documents or articles by retrieving relevant information and generating concise summaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Question answering&lt;/strong&gt;: RAG can be applied to question answering tasks, where the retriever fetches relevant information and the generator produces the final answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is a powerful paradigm that has the potential to revolutionize the field of GenAI. By combining the strengths of retrieval-based and generation-based approaches, RAG models can produce more accurate, informative, and context-specific outputs. As Data Engineers, it's essential to stay up-to-date with the latest advancements in RAG and explore its applications in various domains. Whether you're working on chatbots, text summarization, or question answering, RAG is definitely worth considering as a valuable tool in your toolkit.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>genai</category>
      <category>rag</category>
      <category>ai</category>
    </item>
    <item>
      <title>Agentic AI Explained</title>
      <dc:creator>Kumaravelu Saraboji Mahalingam</dc:creator>
      <pubDate>Fri, 27 Feb 2026 01:03:51 +0000</pubDate>
      <link>https://forem.com/databro/agentic-ai-explained-3g3a</link>
      <guid>https://forem.com/databro/agentic-ai-explained-3g3a</guid>
      <description>&lt;h3&gt;
  
  
  Introduction to Agentic AI
&lt;/h3&gt;

&lt;p&gt;Agentic AI refers to a subset of artificial intelligence (AI) that focuses on creating autonomous agents capable of making decisions and taking actions based on their environment, goals, and constraints. These agents can be used in various applications, including robotics, smart homes, and decision support systems. As a data engineer, understanding the concept of agentic AI and its components is crucial for designing and implementing effective AI solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Components of Agentic AI
&lt;/h3&gt;

&lt;p&gt;Agentic AI consists of several key components that work together to enable autonomous decision-making and action-taking. These components include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sensors&lt;/strong&gt;: These are the inputs that provide the agent with information about its environment. Sensors can be physical, such as cameras or microphones, or virtual, such as data streams or APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning and Decision-Making&lt;/strong&gt;: This component is responsible for analyzing the data from the sensors and making decisions based on the agent's goals and constraints. Reasoning and decision-making can be achieved using various techniques, including rule-based systems, machine learning, or optimization algorithms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actuators&lt;/strong&gt;: These are the outputs that enable the agent to take actions in its environment. Actuators can be physical, such as motors or speakers, or virtual, such as sending notifications or making API calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goals and Constraints&lt;/strong&gt;: These define the objectives and limitations of the agent. Goals can be specified using various techniques, such as reward functions or objective functions, while constraints can be defined using rules or optimization constraints.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Types of Agentic AI
&lt;/h3&gt;

&lt;p&gt;There are several types of agentic AI, each with its strengths and weaknesses. Some of the most common types include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reactive Agents&lt;/strong&gt;: These agents respond to their environment without maintaining any internal state or memory. Reactive agents are simple and efficient but can be limited in their ability to make complex decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive Agents&lt;/strong&gt;: These agents maintain an internal state and can anticipate and plan for future events. Proactive agents are more complex and powerful than reactive agents but require more computational resources and data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Agents&lt;/strong&gt;: These agents combine the benefits of reactive and proactive agents by using a combination of reactive and proactive techniques.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applications of Agentic AI
&lt;/h3&gt;

&lt;p&gt;Agentic AI has a wide range of applications across various industries, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Robotics&lt;/strong&gt;: Agentic AI can be used to control robots and enable them to navigate and interact with their environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Homes&lt;/strong&gt;: Agentic AI can be used to control and automate smart home devices, such as thermostats and lights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Support Systems&lt;/strong&gt;: Agentic AI can be used to provide decision support for complex tasks, such as financial planning or medical diagnosis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenges and Limitations
&lt;/h3&gt;

&lt;p&gt;While agentic AI has the potential to revolutionize various industries, it also poses several challenges and limitations, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Quality and Availability&lt;/strong&gt;: Agentic AI requires high-quality and relevant data to make effective decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainability and Transparency&lt;/strong&gt;: Agentic AI can be complex and difficult to interpret, making it challenging to understand the decision-making process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security and Safety&lt;/strong&gt;: Agentic AI can pose security and safety risks if not designed and implemented properly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Agentic AI is a powerful and versatile technology that has the potential to transform various industries. As a data engineer, understanding the key components, types, and applications of agentic AI is crucial for designing and implementing effective AI solutions. However, agentic AI also poses several challenges and limitations that need to be addressed to ensure its safe and effective deployment. By continuing to advance and improve agentic AI, we can unlock its full potential and create more autonomous, efficient, and effective systems. &lt;/p&gt;

&lt;h3&gt;
  
  
  Future Directions
&lt;/h3&gt;

&lt;p&gt;As agentic AI continues to evolve, we can expect to see significant advancements in areas such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edge AI&lt;/strong&gt;: The integration of agentic AI with edge computing to enable real-time processing and decision-making.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainable AI&lt;/strong&gt;: The development of techniques and tools to improve the explainability and transparency of agentic AI decision-making.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-AI Collaboration&lt;/strong&gt;: The design of systems that enable effective collaboration between humans and agentic AI agents. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By exploring these future directions and addressing the challenges and limitations of agentic AI, we can create more sophisticated and effective AI systems that transform industries and improve our lives.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>robotics</category>
    </item>
  </channel>
</rss>
