<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tejasvi</title>
    <description>The latest articles on Forem by Tejasvi (@t3jasvi).</description>
    <link>https://forem.com/t3jasvi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3174934%2F9e1d95c2-c746-4b0b-8be9-8e9fdbe38bef.jpg</url>
      <title>Forem: Tejasvi</title>
      <link>https://forem.com/t3jasvi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/t3jasvi"/>
    <language>en</language>
    <item>
      <title>CUDA Deep Dive: Demystifying Kernels, Thread Hierarchies, and the GPU Execution Model: P-1</title>
      <dc:creator>Tejasvi</dc:creator>
      <pubDate>Tue, 03 Jun 2025 20:14:05 +0000</pubDate>
      <link>https://forem.com/t3jasvi/cuda-deep-dive-demystifying-kernels-thread-hierarchies-and-the-gpu-execution-model-p-1-19mb</link>
      <guid>https://forem.com/t3jasvi/cuda-deep-dive-demystifying-kernels-thread-hierarchies-and-the-gpu-execution-model-p-1-19mb</guid>
      <description>&lt;h2&gt;
  
  
  CUDA Deep Dive: Demystifying Kernels, Thread Hierarchies, and the GPU Execution Model: P-1
&lt;/h2&gt;

&lt;p&gt;Welcome back! In our last discussion, we scratched the surface of CUDA programming, looking at how it extends C to harness the power of GPUs. Now, let's take a more technical plunge. We're still drawing from "Programming Massively Parallel Processors" (see context &lt;a href="https://www.cse.iitd.ac.in/~rijurekha/col730_2022/cudabook.pdf" rel="noopener noreferrer"&gt;here&lt;/a&gt; - focusing on Chapter 3 concepts), but this time we'll dissect the execution model, memory implications, and the intricate dance of threads with more precision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dichotomy: Host (CPU) vs. Device (GPU) Architectures
&lt;/h2&gt;

&lt;p&gt;At its core, CUDA programming acknowledges two distinct computational domains:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Host:&lt;/strong&gt; Your system's CPU, managing system resources, peripherals, and orchestrating the overall application flow. It operates with its own dedicated DRAM (system memory).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Device:&lt;/strong&gt; The NVIDIA GPU, a massively parallel processor with its own high-bandwidth memory (GPU device memory, often GDDR). It's comprised of multiple &lt;strong&gt;Streaming Multiprocessors (SMs)&lt;/strong&gt;, each containing numerous &lt;strong&gt;CUDA cores&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The art of CUDA programming lies in efficiently partitioning tasks and managing data transfers between these two domains. Data must be explicitly moved from host memory to device memory for GPU processing, and results moved back. These transfers, typically over the PCIe bus, can be a significant performance bottleneck if not managed carefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  CUDA C Function Specifiers: Defining Execution Space
&lt;/h2&gt;

&lt;p&gt;CUDA C extends standard C with keywords to specify where functions are compiled for and where they execute.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;__host__&lt;/code&gt; &lt;strong&gt;Functions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Standard C functions, compiled for and executed on the &lt;strong&gt;host (CPU)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Can only be called from other &lt;code&gt;__host__&lt;/code&gt; functions or from the global scope.&lt;/li&gt;
&lt;li&gt;  This is the default if no CUDA-specific keyword is present, simplifying the porting of legacy C/C++ code.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;code&gt;__device__&lt;/code&gt; &lt;strong&gt;Functions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Compiled for and executed on the &lt;strong&gt;device (GPU)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Can only be called from &lt;code&gt;__global__&lt;/code&gt; functions (kernels) or other &lt;code&gt;__device__&lt;/code&gt; functions.&lt;/li&gt;
&lt;li&gt;  They are often &lt;strong&gt;inlined&lt;/strong&gt; by the NVCC compiler to avoid function call overhead on the GPU, which can be substantial compared to CPU function calls.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Crucial Limitations&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;No recursion&lt;/strong&gt;: The hardware stack support for deep recursion is limited.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No traditional static variables&lt;/strong&gt;: The concept of a single static variable shared across all threads in the way C expects doesn't directly map well. There are ways to achieve shared state (e.g., using shared memory or special device-wide variables), but &lt;code&gt;static&lt;/code&gt; inside a &lt;code&gt;__device__&lt;/code&gt; function behaves differently (each thread might get its own instance, or it might be disallowed depending on context and compiler version).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No indirect function calls through function pointers (in older CUDA compute capabilities)&lt;/strong&gt;: This restriction has been relaxed in newer compute capabilities (e.g., dynamic parallelism, separate compilation), but for foundational understanding, assume it's a constraint. Direct calls are the norm.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;code&gt;__global__&lt;/code&gt; &lt;strong&gt;Functions (Kernels)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  These are the &lt;strong&gt;entry points&lt;/strong&gt; for GPU computation launched from the host.&lt;/li&gt;
&lt;li&gt;  Executed on the &lt;strong&gt;device (GPU)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Must have a &lt;code&gt;void&lt;/code&gt; return type.&lt;/li&gt;
&lt;li&gt;  The call to a &lt;code&gt;__global__&lt;/code&gt; function from the host is &lt;strong&gt;asynchronous&lt;/strong&gt; by default: the CPU initiates the kernel launch and can continue executing other host code without waiting for the kernel to complete (synchronization points like &lt;code&gt;cudaDeviceSynchronize()&lt;/code&gt; or blocking memory copies are needed to wait).&lt;/li&gt;
&lt;li&gt;  When a kernel is launched, its execution is configured by specifying the &lt;strong&gt;grid dimensions&lt;/strong&gt; and &lt;strong&gt;block dimensions&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;code&gt;__host__ __device__&lt;/code&gt; &lt;strong&gt;Functions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A powerful directive telling NVCC to compile &lt;strong&gt;two versions&lt;/strong&gt; of the function: one for the host and one for the device.&lt;/li&gt;
&lt;li&gt;  Allows for code reuse when the same logic is applicable in both execution spaces.&lt;/li&gt;
&lt;li&gt;  Useful for common utility functions (e.g., math operations, data transformations) that don't rely on execution-space-specific features.
&lt;/li&gt;
&lt;/ul&gt;

&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Can be called from host code or device code&lt;/span&gt;
&lt;span class="n"&gt;__host__&lt;/span&gt; &lt;span class="n"&gt;__device__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;clamp_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;max_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;max_val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  The GPU Execution Model: Grids, Blocks, Warps, and Threads
&lt;/h2&gt;

&lt;p&gt;When a &lt;code&gt;__global__&lt;/code&gt; kernel is launched, it executes as a &lt;strong&gt;grid&lt;/strong&gt; of &lt;strong&gt;thread blocks&lt;/strong&gt;. This hierarchical structure is fundamental to how CUDA maps parallelism to the GPU hardware.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ekuohhxs3piiq2oh0mn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ekuohhxs3piiq2oh0mn.png" alt="Image description" width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Threads&lt;/strong&gt;: The most basic unit of execution. Each thread executes the kernel code. Threads are extremely lightweight.

&lt;ul&gt;
&lt;li&gt;  Identified within their block by &lt;code&gt;threadIdx&lt;/code&gt; (a &lt;code&gt;uint3&lt;/code&gt; variable: &lt;code&gt;threadIdx.x&lt;/code&gt;, &lt;code&gt;threadIdx.y&lt;/code&gt;, &lt;code&gt;threadIdx.z&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Warps&lt;/strong&gt;: Threads are grouped by the hardware into &lt;strong&gt;warps&lt;/strong&gt;. A warp consists of a fixed number of threads (typically 32 in current NVIDIA architectures). Threads within a warp execute in &lt;strong&gt;SIMT (Single Instruction, Multiple Thread)&lt;/strong&gt; fashion.

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;SIMT&lt;/strong&gt;: All threads in a warp execute the same instruction at the same time. If threads in a warp diverge due to conditional branching (e.g., an &lt;code&gt;if-else&lt;/code&gt; statement where some threads take one path and others take another), the warp serially executes each branch path, disabling threads that are not on that path. This &lt;strong&gt;thread divergence&lt;/strong&gt; can significantly impact performance and should be minimized.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Thread Blocks (Cooperative Thread Arrays - CTAs)&lt;/strong&gt;: A group of threads (organized in 1D, 2D, or 3D up to a maximum number, e.g., 1024 threads per block).

&lt;ul&gt;
&lt;li&gt;  Threads within the same block can cooperate by:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Sharing data via &lt;code&gt;__shared__&lt;/code&gt; memory&lt;/strong&gt;: A low-latency, on-chip memory space private to that block and visible to all threads within it. Data in shared memory persists for the lifetime of the block.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Synchronizing execution using &lt;code&gt;__syncthreads()&lt;/code&gt;&lt;/strong&gt;: This is a barrier synchronization primitive. When a thread reaches &lt;code&gt;__syncthreads()&lt;/code&gt;, it waits until &lt;em&gt;all&lt;/em&gt; other threads in its block have also reached that point before any thread proceeds. This is crucial for coordinating access to shared memory (e.g., ensuring all reads happen before any writes, or vice-versa).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  A block is scheduled by the CUDA runtime to execute on a single &lt;strong&gt;Streaming Multiprocessor (SM)&lt;/strong&gt;. Once scheduled on an SM, a block runs to completion on that SM (though its warps may be interleaved with warps from other blocks on the same SM). An SM can often execute multiple blocks concurrently if it has sufficient resources (registers, shared memory).&lt;/li&gt;

&lt;li&gt;  Identified within the grid by &lt;code&gt;blockIdx&lt;/code&gt; (a &lt;code&gt;uint3&lt;/code&gt; variable: &lt;code&gt;blockIdx.x&lt;/code&gt;, &lt;code&gt;blockIdx.y&lt;/code&gt;, &lt;code&gt;blockIdx.z&lt;/code&gt;).&lt;/li&gt;

&lt;li&gt;  The dimensions of a block are available within the kernel via &lt;code&gt;blockDim&lt;/code&gt; (a &lt;code&gt;dim3&lt;/code&gt; variable: &lt;code&gt;blockDim.x&lt;/code&gt;, &lt;code&gt;blockDim.y&lt;/code&gt;, &lt;code&gt;blockDim.z&lt;/code&gt;).&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Grid&lt;/strong&gt;: Composed of all thread blocks launched for a given kernel call. Can be 1D, 2D, or 3D.

&lt;ul&gt;
&lt;li&gt;  Blocks within a grid execute independently and, generally, cannot directly synchronize with each other, except through global memory operations (which can be slow and require careful handling, often with atomic operations) or by terminating the kernel and launching a new one.&lt;/li&gt;
&lt;li&gt;  The dimensions of the grid (in terms of blocks) are available within the kernel via &lt;code&gt;gridDim&lt;/code&gt; (a &lt;code&gt;dim3&lt;/code&gt; variable: &lt;code&gt;gridDim.x&lt;/code&gt;, &lt;code&gt;gridDim.y&lt;/code&gt;, &lt;code&gt;gridDim.z&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Computing a Global Thread ID
&lt;/h3&gt;

&lt;p&gt;Since each thread needs to work on a unique piece of data, it's essential to calculate a global index. For a 1D grid of 1D blocks:&lt;br&gt;
&lt;code&gt;int globalThreadId_x = blockIdx.x * blockDim.x + threadIdx.x;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;For a 2D grid of 2D blocks, computing a unique 2D global index &lt;code&gt;(gx, gy)&lt;/code&gt;:&lt;br&gt;
&lt;code&gt;int gx = blockIdx.x * blockDim.x + threadIdx.x;&lt;/code&gt;&lt;br&gt;
&lt;code&gt;int gy = blockIdx.y * blockDim.y + threadIdx.y;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This global ID is then used to access elements in global memory arrays.&lt;/p&gt;
&lt;h3&gt;
  
  
  Matrix Multiplication Revisited (with a Glimpse of Multiple Blocks)
&lt;/h3&gt;

&lt;p&gt;The book's Figure 3.11 presents a kernel that calculates one element of the product matrix &lt;code&gt;P = M * N&lt;/code&gt; per thread, using only &lt;code&gt;threadIdx&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;P_ij = Σ_k (M_ik * N_kj)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;M&lt;/code&gt; is &lt;code&gt;height_M x width_M&lt;/code&gt; and &lt;code&gt;N&lt;/code&gt; is &lt;code&gt;width_M x width_N&lt;/code&gt;, then &lt;code&gt;P&lt;/code&gt; is &lt;code&gt;height_M x width_N&lt;/code&gt;.&lt;br&gt;
A kernel to compute &lt;code&gt;P&lt;/code&gt; might look like this (assuming 2D blocks covering the entire &lt;code&gt;P&lt;/code&gt; matrix, and enough blocks to cover it):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;matrixMulKernel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;P_d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;M_d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;N_d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;P_height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;P_width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;M_width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Global row index (for P and M)&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// Global column index (for P and N)&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Boundary check: Ensure thread is within the matrix dimensions&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;P_height&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;P_width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// Dot product for P[row][col]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;M_width&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// M_d is row-major: M_d[row * M_width + k]&lt;/span&gt;
            &lt;span class="c1"&gt;// N_d is assumed column-major by convention for this loop structure,&lt;/span&gt;
            &lt;span class="c1"&gt;// or if row-major: N_d[k * P_width + col]&lt;/span&gt;
            &lt;span class="c1"&gt;// Let's assume M_d and N_d are row-major for C-style array access.&lt;/span&gt;
            &lt;span class="c1"&gt;// M_d element: M[row][k]&lt;/span&gt;
            &lt;span class="c1"&gt;// N_d element: N[k][col]&lt;/span&gt;
            &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;M_d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;M_width&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;N_d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;P_width&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;P_d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;P_width&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// P[row][col]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a more complete version. The kernel in Figure 3.11 is simplified for pedagogical reasons by assuming a single block, thus implicitly &lt;code&gt;blockIdx.x = 0&lt;/code&gt; and &lt;code&gt;blockIdx.y = 0&lt;/code&gt;, and &lt;code&gt;row = threadIdx.x&lt;/code&gt;, &lt;code&gt;col = threadIdx.y&lt;/code&gt; (the book maps &lt;code&gt;tx&lt;/code&gt; to row and &lt;code&gt;ty&lt;/code&gt; to col from &lt;code&gt;threadIdx.x&lt;/code&gt; and &lt;code&gt;threadIdx.y&lt;/code&gt; respectively). Chapter 4 will elaborate on multi-block implementations and tiling for performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kernel Launch Configuration: &lt;code&gt;&amp;lt;&amp;lt;&amp;lt;...&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The host launches a kernel using the triple-chevron syntax:&lt;br&gt;
&lt;code&gt;kernelName&amp;lt;&amp;lt;&amp;lt; Dg, Db, Ns, S &amp;gt;&amp;gt;&amp;gt;(argument_list);&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;Dg&lt;/code&gt;: &lt;code&gt;dim3&lt;/code&gt; type, specifies the dimensions of the grid (number of blocks in x, y, z). &lt;code&gt;Dg.x * Dg.y * Dg.z&lt;/code&gt; total blocks.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Db&lt;/code&gt;: &lt;code&gt;dim3&lt;/code&gt; type, specifies the dimensions of each thread block (number of threads in x, y, z). &lt;code&gt;Db.x * Db.y * Db.z&lt;/code&gt; threads per block. The total number of threads per block cannot exceed a device-specific limit (e.g., 1024).&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Ns&lt;/code&gt; (Optional): &lt;code&gt;size_t&lt;/code&gt; type, specifies the bytes of dynamically allocated &lt;code&gt;__shared__&lt;/code&gt; memory per block, in addition to statically allocated shared memory. Defaults to 0.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;S&lt;/code&gt; (Optional): &lt;code&gt;cudaStream_t&lt;/code&gt; type, specifies the CUDA stream the kernel is launched into. Streams allow for managing concurrency of multiple operations. Defaults to stream 0 (the default stream).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example from Figure 3.14:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;dim3&lt;/span&gt; &lt;span class="nf"&gt;dimGrid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Only one block in the grid (for the simplified example)&lt;/span&gt;
&lt;span class="n"&gt;dim3&lt;/span&gt; &lt;span class="nf"&gt;dimBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Each block has 16x16 = 256 threads&lt;/span&gt;
&lt;span class="n"&gt;matrixMulKernel&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;dimGrid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dimBlock&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_Pd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_Md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_Nd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WIDTH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;dimGrid&lt;/code&gt; becomes &lt;code&gt;gridDim&lt;/code&gt; inside the kernel, and &lt;code&gt;dimBlock&lt;/code&gt; becomes &lt;code&gt;blockDim&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Essential Runtime API Functions
&lt;/h2&gt;

&lt;p&gt;The CUDA runtime API provides functions for managing the GPU:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;cudaMalloc(void **devPtr, size_t size)&lt;/code&gt;: Allocates &lt;code&gt;size&lt;/code&gt; bytes of &lt;strong&gt;linear global memory&lt;/strong&gt; on the device and returns a pointer to it in &lt;code&gt;*devPtr&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cudaMemcpy(void *dst, const void *src, size_t count, cudaMemcpyKind kind)&lt;/code&gt;: Copies &lt;code&gt;count&lt;/code&gt; bytes of data. The &lt;code&gt;kind&lt;/code&gt; argument specifies the direction:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;cudaMemcpyHostToDevice&lt;/code&gt;: Host to Device&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cudaMemcpyDeviceToHost&lt;/code&gt;: Device to Host&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cudaMemcpyDeviceToDevice&lt;/code&gt;: Device to Device&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cudaMemcpyHostToHost&lt;/code&gt;: (Usually less efficient than standard &lt;code&gt;memcpy&lt;/code&gt;)
&lt;code&gt;cudaMemcpy&lt;/code&gt; operations are generally &lt;strong&gt;blocking/synchronous&lt;/strong&gt; with respect to the host, unless they are part of asynchronous operations involving streams.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;code&gt;cudaFree(void *devPtr)&lt;/code&gt;: Frees memory allocated with &lt;code&gt;cudaMalloc&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;  &lt;code&gt;cudaDeviceSynchronize()&lt;/code&gt;: Blocks host execution until all previously issued CUDA calls (kernels, memory copies in any stream) on the current device have completed. Essential for ensuring results are ready or for accurate timing.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deeper Implications and Performance Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Occupancy&lt;/strong&gt;: The ratio of active warps on an SM to the maximum number of warps that SM can support. Higher occupancy can help hide memory latency, as the SM can switch to other ready warps when one warp is stalled (e.g., waiting for global memory access). Block dimensions, register usage per thread, and shared memory usage per block heavily influence occupancy.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Memory Coalescing&lt;/strong&gt;: When threads in a warp access global memory, if their accesses fall into a contiguous, aligned segment, the hardware can "coalesce" these into a single (or few) memory transaction(s), which is much more efficient than scattered accesses. Designing data layouts and access patterns for coalescing is critical for good performance.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Shared Memory Banking&lt;/strong&gt;: Shared memory is divided into banks. Concurrent accesses by threads in a warp to different banks can proceed in parallel. Accesses to the same bank (bank conflicts) are serialized, reducing effective bandwidth. Understanding and avoiding bank conflicts is key when using shared memory heavily.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Built-in Variables (Brief Mention)
&lt;/h2&gt;

&lt;p&gt;Beyond &lt;code&gt;threadIdx&lt;/code&gt;, &lt;code&gt;blockIdx&lt;/code&gt;, &lt;code&gt;blockDim&lt;/code&gt;, &lt;code&gt;gridDim&lt;/code&gt;, CUDA provides others like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;warpSize&lt;/code&gt;: An integer (typically 32) indicating the number of threads in a warp.
This allows for warp-level programming idioms.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>cuda</category>
      <category>deeplearning</category>
      <category>cpp</category>
    </item>
    <item>
      <title>VideoSnap Vision: Real-Time Object Recognition PWA Architecture</title>
      <dc:creator>Tejasvi</dc:creator>
      <pubDate>Sun, 18 May 2025 02:18:32 +0000</pubDate>
      <link>https://forem.com/t3jasvi/videosnap-vision-real-time-object-recognition-pwa-architecture-488c</link>
      <guid>https://forem.com/t3jasvi/videosnap-vision-real-time-object-recognition-pwa-architecture-488c</guid>
      <description>&lt;h2&gt;
  
  
  Real-Time Object Recognition in a React PWA with Hugging Face Transformers
&lt;/h2&gt;

&lt;p&gt;Hey folks! I recently built a super fun Progressive Web App (PWA) that does &lt;em&gt;real-time&lt;/em&gt; object recognition using a small multimodal LLM from Hugging Face. Picture this: point your webcam at something, and the app instantly tells you what it sees—a dog, a cup, or even your favorite sneaker! It works right in your browser, even offline, and feels just like a native app. Pretty cool, right? Here’s how I pulled it off with React, TensorFlow.js, and a dash of PWA magic. Let’s dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Real-Time Video in a PWA?
&lt;/h2&gt;

&lt;p&gt;I'm a big fan of apps that are always ready to go, even if my internet connection isn't. Plus, who wants to rely on a beefy server for live video processing if you can do it on the device? PWAs are fantastic for this: they're installable, cache what they need for offline use, and work across all sorts of devices. For the brains of the operation—the Machine Learning part—I picked a small multimodal LLM from Hugging Face (think a lightweight version of CLIP). These models are champs at recognizing objects in images or video frames and are nimble enough to run smoothly in the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the React PWA
&lt;/h2&gt;

&lt;p&gt;First things first, I got my React PWA project started using Create React App’s PWA template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-react-app video-object-pwa &lt;span class="nt"&gt;--template&lt;/span&gt; cra-template-pwa
&lt;span class="nb"&gt;cd &lt;/span&gt;video-object-pwa
npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command set me up with a &lt;code&gt;service-worker.js&lt;/code&gt; for handling offline caching and a &lt;code&gt;manifest.json&lt;/code&gt; to give it that authentic app-like feel (like being installable on your home screen!). I popped into the &lt;code&gt;manifest.json&lt;/code&gt; to name my app “VideoSnap” and gave it a snazzy icon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Our App's Blueprint: The Architecture
&lt;/h2&gt;

&lt;p&gt;Before we get into the nitty-gritty of code, let's take a bird's-eye view of how VideoSnap is put together. A picture is worth a thousand words, so here's a diagram (imagine this rendered beautifully with Eraser.io!):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foenstj54ryqcwpohkjyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foenstj54ryqcwpohkjyz.png" alt="Image description" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's break down what's happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;User's Device&lt;/strong&gt;: Everything happens right here! No servers involved for the core functionality.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Web Browser&lt;/strong&gt;: This is our app's home.

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;VideoSnap PWA (React App)&lt;/strong&gt;: This is our actual application code.

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;App Shell &amp;amp; UI&lt;/code&gt;: The main interface you see and interact with.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;VideoRecognizer Component&lt;/code&gt;: The star of the show, handling webcam input and displaying predictions.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;PWA Features&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;Manifest.json&lt;/code&gt;: Tells the browser how to treat our app (icon, name, installability).&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Service Worker&lt;/code&gt;: The background hero that caches assets and the ML model, enabling offline use and speeding up subsequent loads. It intercepts network requests and can serve files directly from the...&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Browser Caches&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;PWAAssetsCache&lt;/code&gt;: Stores our app's code (JS, CSS, images).&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;ModelCache&lt;/code&gt;: Crucially, this holds the downloaded ML model files. Once downloaded, they're available offline!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;In-Browser ML Stack&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;🤗 Transformers.js&lt;/code&gt;: Makes it easy to use Hugging Face models in JavaScript. It handles loading the model and processor, and helps with preprocessing data.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;TensorFlow.js&lt;/code&gt;: The underlying library that runs the ML model computations efficiently in the browser.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;WebGL Backend&lt;/code&gt;: TensorFlow.js uses WebGL to tap into your device's GPU for much faster calculations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Browser APIs&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;Webcam (getUserMedia)&lt;/code&gt;: Lets our app access the camera.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Cache/Storage API&lt;/code&gt;: Used by the Service Worker to store and retrieve files.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Key Interactions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Opening the App (1)&lt;/strong&gt;: You open VideoSnap. The &lt;code&gt;Manifest.json&lt;/code&gt; helps it look and feel like an app.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accessing Webcam (2)&lt;/strong&gt;: The &lt;code&gt;VideoRecognizer&lt;/code&gt; component asks for permission to use your webcam via &lt;code&gt;getUserMedia&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Service Worker Magic (3)&lt;/strong&gt;: The Service Worker gets registered. On first load, it fetches all app assets and the ML model, then tucks them away in the Browser Caches. On later visits (or when offline), it serves these directly from the cache – super fast!&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Loading the Model (4)&lt;/strong&gt;: Our component uses &lt;code&gt;Transformers.js&lt;/code&gt; to load the object recognition model. The Service Worker might intercept this request and serve the model from its cache.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Real-Time Loop (5-7)&lt;/strong&gt;:

&lt;ol&gt;
&lt;li&gt; A video frame is captured.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;Transformers.js&lt;/code&gt; preprocesses this frame.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;TensorFlow.js&lt;/code&gt; (using the WebGL backend for speed) runs the model to get a prediction.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;Transformers.js&lt;/code&gt; translates this into a human-readable label.&lt;/li&gt;
&lt;li&gt; The UI updates to show you what it "sees"! This loop repeats, giving you real-time object recognition.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture allows VideoSnap to be fast, offline-capable, and process video directly on your device, which is pretty powerful stuff for a web app!&lt;/p&gt;

&lt;h2&gt;
  
  
  Grabbing the Hugging Face Model
&lt;/h2&gt;

&lt;p&gt;I chose a compact multimodal LLM from Hugging Face, specifically &lt;code&gt;openai/clip-vit-base-patch32&lt;/code&gt; (or you could go for an even smaller, distilled variant if speed is paramount). These CLIP-style models are great because they can compare an image (or a video frame) to a list of text descriptions and tell you which description fits best.&lt;/p&gt;

&lt;p&gt;To use it in the browser with TensorFlow.js, we need to convert it.&lt;/p&gt;

&lt;p&gt;First, install the necessary Python libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers tensorflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we'll write a small Python script to download the model and processor from Hugging Face and save them in a format we can then convert.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# export_model.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TFCLIPModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt; &lt;span class="c1"&gt;# For this example, using the TF variant for direct Keras save
&lt;/span&gt;
&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/clip-vit-base-patch32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;EXPORT_DIR_BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./clip_export_temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Temporary base directory for raw exports
&lt;/span&gt;&lt;span class="n"&gt;MODEL_SAVE_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;EXPORT_DIR_BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/model_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# For Keras model (tf_model.h5) and its config.json
&lt;/span&gt;&lt;span class="n"&gt;PROCESSOR_SAVE_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;EXPORT_DIR_BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/processor_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# For processor configuration files
&lt;/span&gt;
&lt;span class="c1"&gt;# Create directories if they don't exist
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_SAVE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PROCESSOR_SAVE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load pre-trained model and processor
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TFCLIPModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading processor: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save Keras model (tf_model.h5) and its config.json
# This is what Transformers.js (AutoModel) will need for the model architecture.
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_SAVE_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model files (incl. config.json and tf_model.h5) saved to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MODEL_SAVE_DIR&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save processor configuration files (preprocessor_config.json, tokenizer files, etc.)
# These are what Transformers.js (AutoProcessor) will need.
&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PROCESSOR_SAVE_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processor files saved to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PROCESSOR_SAVE_DIR&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this Python script (&lt;code&gt;python export_model.py&lt;/code&gt;). It will download the model files (including &lt;code&gt;tf_model.h5&lt;/code&gt; and &lt;code&gt;config.json&lt;/code&gt;) into &lt;code&gt;./clip_export_temp/model_files/&lt;/code&gt; and the processor files (like &lt;code&gt;preprocessor_config.json&lt;/code&gt;, &lt;code&gt;tokenizer.json&lt;/code&gt;, etc.) into &lt;code&gt;./clip_export_temp/processor_files/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, convert the Keras model (&lt;code&gt;tf_model.h5&lt;/code&gt;) to the TensorFlow.js web-friendly format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Make sure you have tensorflowjs_converter installed:&lt;/span&gt;
&lt;span class="c"&gt;# pip install tensorflowjs&lt;/span&gt;

tensorflowjs_converter &lt;span class="nt"&gt;--input_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;keras &lt;span class="se"&gt;\&lt;/span&gt;
                       ./clip_export_temp/model_files/tf_model.h5 &lt;span class="se"&gt;\&lt;/span&gt;
                       ./public/model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command takes the &lt;code&gt;tf_model.h5&lt;/code&gt; file and spits out a &lt;code&gt;model.json&lt;/code&gt; file (the model architecture) and one or more binary weight files (&lt;code&gt;.bin&lt;/code&gt;) into your PWA's &lt;code&gt;public/model&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crucial Step for &lt;code&gt;AutoModel&lt;/code&gt; and &lt;code&gt;AutoProcessor&lt;/code&gt;&lt;/strong&gt;:&lt;br&gt;
You need to manually copy some files into that same &lt;code&gt;public/model&lt;/code&gt; directory so &lt;code&gt;Transformers.js&lt;/code&gt; can find them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Copy &lt;code&gt;config.json&lt;/code&gt; from &lt;code&gt;./clip_export_temp/model_files/&lt;/code&gt; into &lt;code&gt;./public/model/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Copy all the processor configuration files (e.g., &lt;code&gt;preprocessor_config.json&lt;/code&gt;, &lt;code&gt;tokenizer.json&lt;/code&gt;, &lt;code&gt;vocab.json&lt;/code&gt;, &lt;code&gt;merges.txt&lt;/code&gt;) from &lt;code&gt;./clip_export_temp/processor_files/&lt;/code&gt; into &lt;code&gt;./public/model/&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After this, your &lt;code&gt;public/model&lt;/code&gt; directory should contain &lt;code&gt;model.json&lt;/code&gt;, the &lt;code&gt;*.bin&lt;/code&gt; weight files, &lt;code&gt;config.json&lt;/code&gt;, and all the processor files. This is what our React app will load.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building the Real-Time Video Component
&lt;/h2&gt;

&lt;p&gt;This is where the real magic happens! I created a React component (&lt;code&gt;VideoRecognizer.js&lt;/code&gt;) that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Accesses the user's webcam.&lt;/li&gt;
&lt;li&gt; Loads our Hugging Face model and processor using &lt;code&gt;Transformers.js&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Continuously grabs frames from the video.&lt;/li&gt;
&lt;li&gt; Preprocesses these frames.&lt;/li&gt;
&lt;li&gt; Runs them through the model for object recognition.&lt;/li&gt;
&lt;li&gt; Displays the prediction.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'm using &lt;code&gt;@tensorflow/tfjs&lt;/code&gt; for the core ML operations and &lt;code&gt;@huggingface/transformers&lt;/code&gt; to easily work with the model. The browser's built-in &lt;code&gt;navigator.mediaDevices.getUserMedia&lt;/code&gt; API handles webcam access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/components/VideoRecognizer.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;alsot&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;tf&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@tensorflow/tfjs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// Using AutoModel and AutoProcessor for flexibility with Hugging Face models&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;AutoModel&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@huggingface/transformers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Define the labels our CLIP model will try to match against.&lt;/span&gt;
&lt;span class="c1"&gt;// For CLIP, descriptive prompts usually work best!&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CANDIDATE_LABELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a cat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a dog&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a car&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a tree&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a coffee cup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a sneaker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a human face&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a laptop&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a keyboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a photo of a bottle of water&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="c1"&gt;// Helper to get a cleaner display name from the label&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;getDisplayName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;labelText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;labelText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;a photo of a &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;VideoRecognizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;videoRef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useRef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setModel&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setProcessor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setPrediction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// For displaying errors&lt;/span&gt;

  &lt;span class="c1"&gt;// Effect to load the model and processor&lt;/span&gt;
  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loadModelAndProcessor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Setting TF.js backend to WebGL...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;webgl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Use WebGL for GPU acceleration&lt;/span&gt;

        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Loading model and processor from /model...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// '/model' points to the public/model directory where we placed all our files&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loadedModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;AutoModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loadedProcessor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nf"&gt;setModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loadedModel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loadedProcessor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Model and processor loaded successfully!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to load model or processor:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Oops! Model load failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Ensure all model files are in public/model/ and reachable.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nf"&gt;loadModelAndProcessor&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[]);&lt;/span&gt;

  &lt;span class="c1"&gt;// Effect to start the webcam video stream&lt;/span&gt;
  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startVideo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Don't try to start video if the model is still loading or if there was an error&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mediaDevices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUserMedia&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;video&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;srcObject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error accessing webcam:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nf"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Webcam error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Please grant permission and ensure no other app is using it.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nf"&gt;startVideo&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Cleanup: stop video tracks when component unmounts&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;srcObject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;srcObject&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTracks&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;track&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;track&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// Re-run if loading state changes or an error occurs&lt;/span&gt;

  &lt;span class="c1"&gt;// Effect for real-time frame processing and inference&lt;/span&gt;
  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;srcObject&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;paused&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Exit if model/processor not ready, or video not playing&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;processFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="c1"&gt;// Ensure video is ready and has dimensions before processing&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;video&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;readyState&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HAVE_ENOUGH_DATA&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;videoWidth&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;requestAnimationFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;processFrame&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Wait for next frame&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Create a temporary canvas to draw the current video frame&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;canvas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;videoWidth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;videoHeight&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2d&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drawImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// Process the image (from canvas) and text labels with the CLIP processor&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="cm"&gt;/*text=*/&lt;/span&gt; &lt;span class="nx"&gt;CANDIDATE_LABELS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="cm"&gt;/*images=*/&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Pass the canvas directly&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;return_tensors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tf&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;truncation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;topLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// tf.tidy helps manage memory by auto-disposing intermediate tensors&lt;/span&gt;
        &lt;span class="nx"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tidy&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="c1"&gt;// Run inference&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// For some models, you might need model(**inputs)&lt;/span&gt;
          &lt;span class="c1"&gt;// CLIP outputs logits_per_image which indicate similarity between image and each text label&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logitsPerImage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logitsPerImage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Shape: [batch_size, num_labels]&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logitsPerImage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;squeeze&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// Squeeze to [num_labels] then apply softmax&lt;/span&gt;

          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topProbIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;probabilities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argMax&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;dataSync&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c1"&gt;// Get index of highest probability&lt;/span&gt;
          &lt;span class="nx"&gt;topLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getDisplayName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;CANDIDATE_LABELS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;topProbIndex&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="nf"&gt;setPrediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`I see... a &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;topLabel&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Inference error:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// You could set an error state here for the user too&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="c1"&gt;// Request the next frame for continuous processing&lt;/span&gt;
      &lt;span class="nf"&gt;requestAnimationFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;processFrame&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c1"&gt;// Start the processing loop&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;animationFrameId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;requestAnimationFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;processFrame&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Cleanup: cancel animation frame when component unmounts or dependencies change&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;cancelAnimationFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;animationFrameId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// Re-run this effect if the model or processor changes&lt;/span&gt;

  &lt;span class="c1"&gt;// Render the UI&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;textAlign&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;20px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;red&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;border&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1px solid red&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;margin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;VideoSnap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/h1&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;strong&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="na"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/strong&amp;gt; {error}&amp;lt;/&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;Please&lt;/span&gt; &lt;span class="nx"&gt;check&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;more&lt;/span&gt; &lt;span class="nx"&gt;details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;Try&lt;/span&gt; &lt;span class="nx"&gt;refreshing&lt;/span&gt; &lt;span class="nx"&gt;or&lt;/span&gt; &lt;span class="nx"&gt;ensuring&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt; &lt;span class="nx"&gt;are&lt;/span&gt; &lt;span class="nx"&gt;correctly&lt;/span&gt; &lt;span class="nx"&gt;placed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onLine&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;textAlign&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;20px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;VideoSnap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/h1&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;You&lt;/span&gt; &lt;span class="nx"&gt;seem&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;be&lt;/span&gt; &lt;span class="nx"&gt;offline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nx"&gt;Please&lt;/span&gt; &lt;span class="nx"&gt;connect&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;internet&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;download&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;AI&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;Once&lt;/span&gt; &lt;span class="nx"&gt;downloaded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="nx"&gt;will&lt;/span&gt; &lt;span class="nx"&gt;be&lt;/span&gt; &lt;span class="nx"&gt;available&lt;/span&gt; &lt;span class="nx"&gt;offline&lt;/span&gt; &lt;span class="nx"&gt;thanks&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;PWA&lt;/span&gt; &lt;span class="nx"&gt;magic&lt;/span&gt;&lt;span class="o"&gt;!&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;textAlign&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;20px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;VideoSnap&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;What&lt;/span&gt; &lt;span class="nx"&gt;Do&lt;/span&gt; &lt;span class="nx"&gt;I&lt;/span&gt; &lt;span class="nx"&gt;See&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/h1&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;🧠&lt;/span&gt; &lt;span class="nx"&gt;Loading&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;AI&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;please&lt;/span&gt; &lt;span class="nx"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;This&lt;/span&gt; &lt;span class="nx"&gt;might&lt;/span&gt; &lt;span class="nx"&gt;take&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;moment&lt;/span&gt; &lt;span class="nx"&gt;on&lt;/span&gt; &lt;span class="nx"&gt;your&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="nx"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;especially&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="nx"&gt;download&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt;
          &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;video&lt;/span&gt; 
            &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;videoRef&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; 
            &lt;span class="nx"&gt;autoPlay&lt;/span&gt; 
            &lt;span class="nx"&gt;playsInline&lt;/span&gt; 
            &lt;span class="nx"&gt;muted&lt;/span&gt; &lt;span class="cm"&gt;/* Muting is often required for autoplay in browsers */&lt;/span&gt;
            &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxWidth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;640px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;border&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2px solid #007bff&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;borderRadius&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;8px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;block&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; 
          &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;          &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.8em&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fontWeight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bold&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;marginTop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;15px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#28a745&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;}
&lt;/span&gt;          &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;Point&lt;/span&gt; &lt;span class="nx"&gt;your&lt;/span&gt; &lt;span class="nx"&gt;camera&lt;/span&gt; &lt;span class="nx"&gt;at&lt;/span&gt; &lt;span class="nx"&gt;an&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="o"&gt;!&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/p&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;}
&lt;/span&gt;        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;VideoRecognizer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that I've switched &lt;code&gt;setInterval&lt;/code&gt; to &lt;code&gt;requestAnimationFrame&lt;/code&gt; for smoother video processing. This ties the processing to the browser's display refresh rate, which is generally better for animations and video.&lt;/p&gt;

&lt;p&gt;I then plugged this &lt;code&gt;VideoRecognizer&lt;/code&gt; component into my main &lt;code&gt;App.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/App.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;VideoRecognizer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./components/VideoRecognizer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./App.css&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// For any global styling&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;App&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt; &lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;App-header&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* You could add a title or nav bar here */&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/header&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;VideoRecognizer&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/main&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;footer&lt;/span&gt; &lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="na"&gt;textAlign&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0.8em&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#777&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nx"&gt;Built&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Hugging&lt;/span&gt; &lt;span class="nx"&gt;Face&lt;/span&gt; &lt;span class="nx"&gt;Transformers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;and&lt;/span&gt; &lt;span class="nx"&gt;TensorFlow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/footer&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making It Offline-Ready
&lt;/h2&gt;

&lt;p&gt;To make sure VideoSnap truly shines as a PWA (and keeps rocking even when the internet flakes out), I updated the &lt;code&gt;service-worker.js&lt;/code&gt; file. The goal is to cache all our app's assets &lt;em&gt;and&lt;/em&gt; those crucial model files.&lt;/p&gt;

&lt;p&gt;The Create React App PWA template gives you a &lt;code&gt;service-worker.js&lt;/code&gt; (often &lt;code&gt;src/service-worker.js&lt;/code&gt; or &lt;code&gt;public/service-worker.js&lt;/code&gt; depending on your setup which gets built into &lt;code&gt;build/service-worker.js&lt;/code&gt;). We need to ensure it precaches our model and processor files from the &lt;code&gt;/model/&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;Here's how you can ensure your model files are part of the precache manifest, or add a custom caching strategy. If you're using CRA's Workbox setup, it typically uses &lt;code&gt;self.__WB_MANIFEST&lt;/code&gt; which includes files from the &lt;code&gt;public&lt;/code&gt; folder. You might need to explicitly list them or use a runtime caching strategy if they are numerous or large.&lt;/p&gt;

&lt;p&gt;A good approach for model files within a Workbox-powered service worker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/service-worker.js (Modify the one generated by Create React App)&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;clientsClaim&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workbox-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ExpirationPlugin&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workbox-expiration&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;precacheAndRoute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;createHandlerBoundToURL&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workbox-precaching&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;registerRoute&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workbox-routing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CacheFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StaleWhileRevalidate&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;workbox-strategies&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;clientsClaim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Precache all of the assets generated by your build process.&lt;/span&gt;
&lt;span class="c1"&gt;// Their URLs are injected into the manifest variable below.&lt;/span&gt;
&lt;span class="c1"&gt;// This variable must be present somewhere in your service worker file,&lt;/span&gt;
&lt;span class="c1"&gt;// even if you decide not to use precaching. See https://cra.link/PWA&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;manifestEntries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__WB_MANIFEST&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="c1"&gt;// Files from the public folder are typically included in __WB_MANIFEST by CRA's build process.&lt;/span&gt;
&lt;span class="c1"&gt;// Double-check if your model files in `public/model` are automatically added.&lt;/span&gt;
&lt;span class="c1"&gt;// If not, or for more control, you can add them manually or use runtime caching.&lt;/span&gt;

&lt;span class="c1"&gt;// Example: Add model files to precache if not automatically included.&lt;/span&gt;
&lt;span class="c1"&gt;// Best practice is to let the build process hash these files for revision control.&lt;/span&gt;
&lt;span class="c1"&gt;// If CRA includes `public` folder contents, this might be redundant.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelFilesToPrecache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="c1"&gt;// Ensure these paths match exactly how they are in your public/model folder&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/model.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// 'null' means don't version based on content hash here, if already versioned by filename or build process.&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/config.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/preprocessor_config.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/tokenizer.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/vocab.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/merges.txt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// IMPORTANT: List ALL your .bin files (TensorFlow.js weight shards)&lt;/span&gt;
  &lt;span class="c1"&gt;// For example, if you have one shard:&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/group1-shard1of1.bin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// If you have multiple, list them all:&lt;/span&gt;
  &lt;span class="c1"&gt;// { url: '/model/group1-shard1ofN.bin', revision: null },&lt;/span&gt;
  &lt;span class="c1"&gt;// { url: '/model/group1-shard2ofN.bin', revision: null },&lt;/span&gt;
  &lt;span class="c1"&gt;// ... etc.&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;// Combine CRA's manifest with our custom model files&lt;/span&gt;
&lt;span class="c1"&gt;// Ensure no duplicates if CRA already includes them.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allFilesToPrecache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;manifestEntries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;modelFilesToPrecache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;modelFile&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;manifestEntries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;modelFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;modelFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)];&lt;/span&gt;

&lt;span class="nf"&gt;precacheAndRoute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allFilesToPrecache&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// You can also use a runtime caching strategy for model files, especially if they are very large&lt;/span&gt;
&lt;span class="c1"&gt;// or you want to fetch them on demand and cache them with specific rules.&lt;/span&gt;
&lt;span class="c1"&gt;// Example: Cache model files with a CacheFirst strategy if not precached.&lt;/span&gt;
&lt;span class="nf"&gt;registerRoute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CacheFirst&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;cacheName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ml-model-cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ExpirationPlugin&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;maxEntries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Cache up to 20 model-related files&lt;/span&gt;
        &lt;span class="na"&gt;maxAgeSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Cache for 30 Days&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;


&lt;span class="c1"&gt;// The rest of CRA's default service worker (routing for index.html, etc.) usually follows...&lt;/span&gt;
&lt;span class="c1"&gt;// This allows the PWA to function as a single-page application.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileExtensionRegexp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RegExp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/[^/?]+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;.[^/]+$&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nf"&gt;registerRoute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;navigate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/_&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileExtensionRegexp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nf"&gt;createHandlerBoundToURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PUBLIC_URL&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this service worker in place, after the first visit, the app loads blazing fast, and the model is ready to go even if you're on a desert island (as long as your device has power!). My component already shows a gentle message if you're offline &lt;em&gt;before&lt;/em&gt; the model's first download.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Hacks &amp;amp; Tips
&lt;/h2&gt;

&lt;p&gt;Running ML in the browser needs a bit of care to keep things snappy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;WebGL is Your Friend&lt;/strong&gt;: &lt;code&gt;tf.setBackend('webgl')&lt;/code&gt; is key. It lets TensorFlow.js use the GPU, which is way faster for these kinds_of tasks than the CPU.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Smooth Processing with &lt;code&gt;requestAnimationFrame&lt;/code&gt;&lt;/strong&gt;: Instead of a fixed &lt;code&gt;setInterval&lt;/code&gt;, using &lt;code&gt;requestAnimationFrame(processFrame)&lt;/code&gt; syncs processing with the browser's refresh rate. This generally leads to smoother visuals and better resource management, as the browser can optimize when to run the frame processing.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model Size&lt;/strong&gt;: I aimed for a model around ~100MB. Smaller models load faster and run quicker. If your chosen model is hefty, look into quantization techniques (like &lt;code&gt;tf.quantization.quantize_weights&lt;/code&gt;) which can shrink model size, often with minimal impact on accuracy.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Memory Management with &lt;code&gt;tf.tidy()&lt;/code&gt;&lt;/strong&gt;: In the &lt;code&gt;processFrame&lt;/code&gt; function, wrapping the TensorFlow.js operations within &lt;code&gt;tf.tidy(() =&amp;gt; { ... })&lt;/code&gt; is a lifesaver. It automatically cleans up (disposes) any intermediate tensors created during the model inference, preventing memory leaks that could crash your app over time, especially with continuous video processing.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Webcam &amp;amp; Component Cleanup&lt;/strong&gt;: Always clean up! In &lt;code&gt;useEffect&lt;/code&gt; hooks, the return function is perfect for stopping the webcam stream (&lt;code&gt;track.stop()&lt;/code&gt;) and canceling animation frames (&lt;code&gt;cancelAnimationFrame&lt;/code&gt;) when the component unmounts. This prevents resource leaks and weird background activity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing and Deploying
&lt;/h2&gt;

&lt;p&gt;Testing is super important! I spent a good amount of time in Chrome DevTools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Lighthouse Audits&lt;/strong&gt;: To check PWA compliance (installability, offline support, performance).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Network Tab&lt;/strong&gt;: Throttling to simulate slower connections and using the "Offline" checkbox to rigorously test the service worker and caching.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Performance Tab&lt;/strong&gt;: To profile JavaScript execution and identify any bottlenecks in the frame processing loop.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Console&lt;/strong&gt;: Watching for any errors from TensorFlow.js or Transformers.js.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On my laptop, inference was taking a fraction of a second per frame with &lt;code&gt;requestAnimationFrame&lt;/code&gt;, making it feel very responsive. The initial app load (including model download) was a bit longer on mobile (5-10 seconds depending on network), but subsequent loads were near-instant thanks to the service worker.&lt;/p&gt;

&lt;p&gt;For deployment, I'm a fan of Vercel for its simplicity with frontend projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run build
vercel &lt;span class="nt"&gt;--prod&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And just like that, VideoSnap was live! I could install it on my phone from the browser and show off its real-time, offline object recognition skills. It really feels like magic having a mini AI sidekick in your pocket.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;This project was a ton of fun, and it's amazing what you can do in the browser these days. But my mind is already buzzing with ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;More Labels&lt;/strong&gt;: Expand the &lt;code&gt;CANDIDATE_LABELS&lt;/code&gt; list to recognize an even wider array of objects.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Fine-tuning Performance&lt;/strong&gt;: Experiment more with model quantization or different small model architectures for even faster inference or lower resource usage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;User-Provided Labels&lt;/strong&gt;: Allow users to type in what they're looking for, turning it into a "visual search" tool.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accessibility&lt;/strong&gt;: Ensuring the app is usable for everyone, including providing feedback via ARIA attributes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're inspired to build something similar, I highly recommend diving into the documentation for &lt;a href="https://huggingface.co/docs/transformers.js" rel="noopener noreferrer"&gt;Hugging Face Transformers.js&lt;/a&gt; and &lt;a href="https://www.tensorflow.org/js" rel="noopener noreferrer"&gt;TensorFlow.js&lt;/a&gt;. The possibilities are incredible.&lt;/p&gt;

&lt;p&gt;Got questions, cool ideas, or hit a snag trying this out? Drop a comment below – I’d love to hear from you!&lt;/p&gt;

&lt;p&gt;Happy coding, and let's keep building PWAs that can see and understand the world in real time! &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Crafted with curiosity and a love for tech that runs in the browser&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
