<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ruben Ghafadaryan</title>
    <description>The latest articles on Forem by Ruben Ghafadaryan (@ruben_ghafadaryan).</description>
    <link>https://forem.com/ruben_ghafadaryan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3423216%2F2f2dc881-fd01-4139-81ea-4ae1d91965af.jpg</url>
      <title>Forem: Ruben Ghafadaryan</title>
      <link>https://forem.com/ruben_ghafadaryan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ruben_ghafadaryan"/>
    <language>en</language>
    <item>
      <title>Detecting Logo Similarity: Combining AI Embeddings with Fourier Descriptors</title>
      <dc:creator>Ruben Ghafadaryan</dc:creator>
      <pubDate>Sun, 09 Nov 2025 16:45:58 +0000</pubDate>
      <link>https://forem.com/ruben_ghafadaryan/detecting-logo-similarity-combining-ai-embeddings-with-fourier-descriptors-5eoc</link>
      <guid>https://forem.com/ruben_ghafadaryan/detecting-logo-similarity-combining-ai-embeddings-with-fourier-descriptors-5eoc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This article started from a conversation in our &lt;a href="//v-mobile.am"&gt;V-Mobile&lt;/a&gt; office. We were discussing cases where new company logos suspiciously resembled famous brands. In many instances, these similarities seemed intentional—designed to confuse customers and boost sales, especially in smaller markets.&lt;br&gt;
This got me thinking: Could we build a system to automatically detect when a new logo copies an existing one?&lt;br&gt;
At first glance, this looks like a straightforward image similarity problem. Many tools handle this well. However, logos are special. They're not like regular photos or illustrations, and as I discovered, detecting logo similarities is far more challenging than expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge with AI-Based Tools
&lt;/h2&gt;

&lt;p&gt;As AI enthusiasts, we naturally started with popular AI models.&lt;/p&gt;

&lt;h3&gt;
  
  
  DINO: Great, But Not Perfect
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ai.meta.com/blog/dino-v2-computer-vision-self-supervised-learning/" rel="noopener noreferrer"&gt;DINO&lt;/a&gt; is excellent for image similarity detection. However, it can be easily confused by background changes or gradient fills.&lt;br&gt;
Example: Here are Image 1 and Image 2, a slightly modified version of Image 1. When I tested them with DINO (specifically dinov2-small), it showed a cosine distance of &lt;strong&gt;0.56&lt;/strong&gt; between their embeddings.(&lt;em&gt;Note:&lt;/em&gt; Throughout this article, "distance" means cosine distance unless specified otherwise.)&lt;/p&gt;

&lt;p&gt;This high distance means DINO thinks they're quite different, even though they're clearly similar to human eyes. This creates false negatives—we might miss real similarities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3ya0anmxg66kd7izeu5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3ya0anmxg66kd7izeu5.png" alt="Image 1. A simple logo" width="400" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Image 1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjul1osda0a4xhnf8nhlf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjul1osda0a4xhnf8nhlf.png" alt="Image 2. Modified version of previous image" width="400" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Image 2.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLIP: Another Piece of the Puzzle
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/clip/" rel="noopener noreferrer"&gt;CLIP&lt;/a&gt; is another powerful similarity tool. It builds embeddings based on what the image represents semantically—in other words, it tries to describe the picture's content.&lt;br&gt;
This works great for most images, but logos often contain abstract curves and shapes that don't have clear semantic meaning. When I compared two visually different images: Image 1 and Image 3, CLIP gave them a distance of 0.80, suggesting they're quite similar just because they share some semantic elements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wfbqsnlbk0zcv05ac14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wfbqsnlbk0zcv05ac14.png" alt="Image 3. A different image producing false similarity" width="240" height="240"&gt;&lt;/a&gt;&lt;br&gt;
Image 3.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Verdict
&lt;/h3&gt;

&lt;p&gt;Relying solely on CLIP or DINO won't give us reliable results. We need additional tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bringing Vectors into the Mix
&lt;/h2&gt;

&lt;p&gt;We needed something to help re-rank results from CLIP and DINO. Ideally, this tool should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invariant to colors&lt;/li&gt;
&lt;li&gt;Optionally invariant to rotations or scaling (in case someone tries to trick the system)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I decided to explore vector representations. What if we convert raster images to vectors and analyze the vector data? This could give us more flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Converting Images to Vectors
&lt;/h3&gt;

&lt;p&gt;First, I converted PNG logos to &lt;a href="https://en.wikipedia.org/wiki/SVG" rel="noopener noreferrer"&gt;SVG&lt;/a&gt; vector files. But before conversion, I preprocessed each image:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove the alpha channel to eliminate transparency&lt;/li&gt;
&lt;li&gt;Remove background using &lt;a href="https://github.com/danielgatis/rembg" rel="noopener noreferrer"&gt;rembg&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Crop near-white colors to avoid confusing the tracer with minor elements&lt;/li&gt;
&lt;li&gt;Limit the maximum dimension to 1024 pixels&lt;/li&gt;
&lt;li&gt;Remove noise using a median filter&lt;/li&gt;
&lt;li&gt;Increase contrast for clearer edges&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After preprocessing, I fed the images to &lt;a href="https://www.visioncortex.org/vtracer/" rel="noopener noreferrer"&gt;vtracer&lt;/a&gt;. To keep things consistent, I limited the output to cubic &lt;a href="https://en.wikipedia.org/wiki/B%C3%A9zier_curve" rel="noopener noreferrer"&gt;Bézier curves&lt;/a&gt;: parametric curves defined by 4 control points.&lt;br&gt;
The results were promising! The vectorized versions captured the essential shapes while eliminating noise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto7ugy454gsa81bcnm53.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fto7ugy454gsa81bcnm53.png" alt="Image 4. Original" width="740" height="740"&gt;&lt;/a&gt;&lt;br&gt;
Image 4. Original PNG Logo File&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79dx2gvsvhhrtkoycc4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79dx2gvsvhhrtkoycc4v.png" alt=" " width="800" height="796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image 5. Pre-processed Image after tracing (a screenshot, as article editor does not allow to load SVG files).&lt;/p&gt;

&lt;h2&gt;
  
  
  Analyzing Bézier Curves with Fourier Descriptors
&lt;/h2&gt;

&lt;p&gt;Now we have SVG files, but we can't compare text files directly. Instead, we need to compare their geometric components.&lt;br&gt;
vtracer gives us paths as cubic Bézier curves. Here's how we extract meaningful data:&lt;/p&gt;

&lt;p&gt;Sample the curves: Since Bézier curves are easy to evaluate at any point, we sample each curve into a fixed number of 2D points;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Apply Fourier Transform: We treat this sequence of points as a signal and apply a &lt;a href="https://en.wikipedia.org/wiki/Discrete_Fourier_transform" rel="noopener noreferrer"&gt;Discrete Fourier Transform&lt;/a&gt; (DFT)&lt;/li&gt;
&lt;li&gt;Extract Fourier descriptors: The low-frequency Fourier coefficients become our shape descriptor&lt;/li&gt;
&lt;li&gt;Normalize: We normalize the sampled points to make them comparable:&lt;/li&gt;
&lt;li&gt;Subtract the centroid (translation invariance)&lt;/li&gt;
&lt;li&gt;Divide by scale (scale invariance)&lt;/li&gt;
&lt;li&gt;Optionally fix the starting point (rotation invariance)&lt;/li&gt;
&lt;li&gt;Now each curve is represented by a fixed-length vector that we can store and compare, just like other embeddings.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xnc84n4onyfz9j8dwce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xnc84n4onyfz9j8dwce.png" alt="Image 6. Descriptor extraction" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image 6. An AI-generated image illustrating extraction of Fourier descriptors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key advantage:&lt;/strong&gt; Unlike CLIP and DINO, these descriptors capture &lt;em&gt;pure geometry&lt;/em&gt; rather than semantics, making them better for fine-grained shape comparison.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Catch:&lt;/strong&gt; &lt;em&gt;False Positives&lt;/em&gt;.Unfortunately, this approach has its own problem: false positives. Completely different images might contain similar curves, producing misleadingly high similarity scores.&lt;/p&gt;

&lt;p&gt;For example, when comparing two clearly similar images Image 1 and Image 2, the Fourier descriptor distance was &lt;strong&gt;0.63&lt;/strong&gt;—moderately similar. But when comparing one of them to a completely different image Image 3, the distance was &lt;strong&gt;0.89&lt;/strong&gt;—only slightly more different.&lt;/p&gt;

&lt;p&gt;I also tried calculating &lt;a href="https://medium.com/@sim30217/chamfer-distance-4207955e8612" rel="noopener noreferrer"&gt;Chamfer distance&lt;/a&gt; between individual Bézier curves for point-to-point matching, but this made things worse. The problem remained: too many false positives.&lt;br&gt;
At this point, I needed to step back and rethink the approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: A Combined Approach
&lt;/h2&gt;

&lt;p&gt;After all this experimentation, I reached these conclusions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DINO is powerful but can produce false negatives&lt;/li&gt;
&lt;li&gt;CLIP is powerful but can produce false positives&lt;/li&gt;
&lt;li&gt;Fourier Descriptors are relatively unstable with false positives, but can still help filter noise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each method has strengths and weaknesses. The solution? Combine them all.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Weighted Formula
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Similarity = (DINO × 0.7) + (CLIP × 0.2) + (Fourier × 0.1)&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I assigned the highest weight to DINO since it's generally most reliable. CLIP gets a moderate weight, and Fourier descriptors get a small weight just to help filter edge cases.&lt;br&gt;
These weights came from empirical testing and produced much more reliable results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Optimized Search Strategy
&lt;/h2&gt;

&lt;p&gt;When searching through a database of logos, we don't need to calculate everything for every image. Here's an efficient multi-stage approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stage 1&lt;/strong&gt;: Use DINO to retrieve initial candidates, then filter with CLIP. Use thresholds to stop search if high similarity found or no similarity found&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stage 2&lt;/strong&gt;: Use Fourier descriptors to re-rank found similarities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stage 3&lt;/strong&gt; (optional): Re-rank the top results using Chamfer distance with per-path Fourier descriptors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optionally, before starting the multi-stage approach we can&lt;br&gt;
search for &lt;a href="https://en.wikipedia.org/wiki/SHA-2" rel="noopener noreferrer"&gt;SHA256&lt;/a&gt; hash, to &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;find full copies of the image&lt;/li&gt;
&lt;li&gt;search for &lt;a href="https://en.wikipedia.org/wiki/Perceptual_hashing" rel="noopener noreferrer"&gt;perceptual hash&lt;/a&gt;, to find copies with minor modifications&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This staged approach gives us accurate results while avoiding unnecessary calculations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Implementation
&lt;/h2&gt;

&lt;p&gt;I've built a proof of concept system that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A combined storage solution using SQLite3 and FAISS&lt;/li&gt;
&lt;li&gt;Storage for DINO embeddings, CLIP embeddings, and Fourier descriptors (both combined and per-path)&lt;/li&gt;
&lt;li&gt;SHA256 hash and perceptual hashes for each image&lt;/li&gt;
&lt;li&gt;Scripts to populate the database with PNG images&lt;/li&gt;
&lt;li&gt;A search script to find similar logos in the database&lt;/li&gt;
&lt;li&gt;A direct comparison script for two specific logos&lt;/li&gt;
&lt;li&gt;Support for both GPU and CPU processing
The code is still under development and does not guarantee stable work. But it still can illustrate the approaches and technics used.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/rghafadaryan/logo-similarity" rel="noopener noreferrer"&gt;https://github.com/rghafadaryan/logo-similarity&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing Data
&lt;/h3&gt;

&lt;p&gt;For this work, I used a subset of 500 logo images from the &lt;a href="https://data.vision.ee.ethz.ch/sagea/lld/" rel="noopener noreferrer"&gt;Large Logo Dataset&lt;/a&gt;. &lt;br&gt;
Direct download: &lt;a href="https://data.vision.ee.ethz.ch/sagea/lld/data/LLD-logo_sample.zip" rel="noopener noreferrer"&gt;https://data.vision.ee.ethz.ch/sagea/lld/data/LLD-logo_sample.zip&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next?
&lt;/h3&gt;

&lt;p&gt;This project is ongoing. The combined approach shows promising results, but there's always room for improvement. I'm continuing to refine the weights, explore additional geometric features, and test on larger datasets.&lt;/p&gt;

&lt;p&gt;I'll be back with more results as this work progresses. If you're working on similar problems or have suggestions, I'd love to hear from you in the comments!&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Use Disclaimer
&lt;/h2&gt;

&lt;p&gt;AI assistance was used in preparing this article to help with grammar, wording, and clarity, since English is not my native language.&lt;/p&gt;

&lt;p&gt;For the coding part of the project, AI-based copilots were used mostly in calculation-heavy sections.&lt;br&gt;
However, every line of code was personally reviewed and verified by me before use.&lt;/p&gt;

&lt;p&gt;All technical decisions, conclusions, and interpretations described here represent my own work.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>multimodal</category>
      <category>fft</category>
    </item>
    <item>
      <title>The 64 KB Challenge: Teaching a Tiny Neural Network to Play Pong</title>
      <dc:creator>Ruben Ghafadaryan</dc:creator>
      <pubDate>Sun, 12 Oct 2025 15:48:06 +0000</pubDate>
      <link>https://forem.com/ruben_ghafadaryan/the-64-kb-challenge-teaching-a-tiny-net-to-play-pong-1m70</link>
      <guid>https://forem.com/ruben_ghafadaryan/the-64-kb-challenge-teaching-a-tiny-net-to-play-pong-1m70</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;As someone who started hacking in the mid’80s, I’m still a shameless fan of retro computers. Sure, they were hilariously limited, but those limits made us crafty. My first machine had 16 KB of RAM (about 2 KB reserved for video). Apps came from a cassette recorder, and somehow that was… fine. &lt;/p&gt;

&lt;p&gt;When the &lt;a href="https://www.atari65xe.com/atari-65xe-computer/" rel="noopener noreferrer"&gt;Atari 65XE&lt;/a&gt; with its majestic 64 KB arrived, we were sure nothing could stop us. Fast-forward to today: I’m on a 64 GB RAM box with a GPU and a terabyte of storage - and I still catch myself thinking, “eh, I could use more.”&lt;/p&gt;

&lt;p&gt;Meanwhile, the resources we casually throw at neural nets are a little terrifying. A standard PyTorch + CUDA install eats gigabytes of disk; “toy” experiments can heat a room and run for hours.&lt;/p&gt;

&lt;p&gt;Unlike today’s parameter-hungry models, the earliest &lt;a href="https://en.wikipedia.org/wiki/Perceptron" rel="noopener noreferrer"&gt;perceptron&lt;/a&gt; experiments ran on vacuum-tube mainframes like the &lt;a href="https://foldoc.org/IBM+704" rel="noopener noreferrer"&gt;IBM 704&lt;/a&gt;, which topped out at 32K 36-bit words (roughly 144 KB of storage). And yet, within that tiny footprint, the perceptron showed something revolutionary: you could learn a decision rule from examples instead of hand-coding logic.&lt;/p&gt;

&lt;p&gt;So here’s the challenge I set for myself: build and train a tiny neural network that can play a simplified Pong as a partner/opponent against a rule-based bot - and keep the entire model plus its training data under &lt;strong&gt;64 KB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A few ground rules so the purists don’t sharpen their pitchforks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I’m not writing this on an actual 8-bit machine. We’ll use modern Python, but we’ll measure and enforce memory like it’s 1987.&lt;/li&gt;
&lt;li&gt;"&lt;em&gt;Under 64 KB&lt;/em&gt;" means: serialized model parameters and model itself consume less than 64 KB memory together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll compare with a "don’t-hold-back" variant (PyTorch + CUDA), suggested by a large model - because contrast is fun.&lt;/p&gt;

&lt;p&gt;And, surely, we’re not doing this for nostalgia points only. We’re doing it because on-device neural nets for IoT are &lt;a href="https://www.sciencedirect.com/science/article/pii/S0925231225014183" rel="noopener noreferrer"&gt;useful right now&lt;/a&gt;: they run without a network, keep data private, cut latency to near-zero, and sip power. Many teams building compact devices need models that are small, trainable on their own data, and autonomous at the edge. This project is a concrete example - and a spark - for deeper work on tiny, task-specific models that actually ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;AI Usage Disclaimer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Did I use AI while building the tiny NN or writing this piece? Yes - selectively. Like most engineers, I use assistants for rough drafts, typo-hunting, and smoothing awkward sentences (helpful since English isn’t my first language). That doesn’t mean the article is auto-generated.&lt;br&gt;
Neither code is AI-created, though AI has been widely used when working on it.&lt;/p&gt;

&lt;p&gt;Guardrails I followed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every line of code and every equation was reviewed by me.&lt;/li&gt;
&lt;li&gt;Titles, section breaks, and tone got light AI polish.&lt;/li&gt;
&lt;li&gt;Constraints, numbers, and trade-offs come from hands-on experiments - not copy-paste.&lt;/li&gt;
&lt;li&gt;If any AI-generated code goes in verbatim, I’ll say so explicitly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Model Constraints and Shape&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The whole point is to live under 64 KB - not just the network, but the serialized weights as well. To make that possible, we don’t feed pixels. We feed the game state: paddle and ball positions, their velocities, a small hint about where the ball is heading, and a rough "time-to-impact" estimate. Once normalized, you’re looking at roughly a dozen scalars. It’s signal, not scenery.&lt;/p&gt;

&lt;p&gt;The network’s job is simple: choose one of three actions - up, hold, or down. No diagonals, no sideways drift. The architecture matches the task: inputs go into a small hidden layer, and out come three logits. At inference time we just pick the largest logit and move on to the next frame. The model implemented is &lt;strong&gt;[12] → [16] → [3]&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv40kehjrqdy27wtqinra.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv40kehjrqdy27wtqinra.png" alt="NN Structure for Tiny Model" width="800" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To save space, weights are stored as signed 4-bit values - two per byte. Activations, however, stay int8 with a fixed scale that covers about [-1, 1). That mix matters. On a network this small, pushing activations down to 4-bit as well makes collapse far more likely - start seeing the model "stick" on one action because there just isn’t enough dynamic range to separate situations cleanly. Keeping activations at int8 buys stability for a few extra bytes, which is a great trade.&lt;/p&gt;

&lt;p&gt;Nonlinearity is a simple, saturating clamp. It’s cheap, keeps values in range, and doesn’t require lookup tables or trig functions. The final layer leaves us with three integer logits; we take the &lt;a href="https://machinelearningmastery.com/argmax-in-machine-learning/" rel="noopener noreferrer"&gt;argmax&lt;/a&gt; and return the value.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Training Process&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We train a small student network to mimic a calm, predictable teacher. The teacher is a simple "physics-intercept" bot: when the ball is coming toward our paddle, it projects the path forward—including wall bounces - until the paddle’s x-line, then heads to meet it; when the ball is leaving, it slides back toward center. A tiny dead-zone around the paddle’s middle prevents jitter. It’s not flashy, but it’s consistent, which gives us labels we can trust. We refer the teacher as a &lt;em&gt;"Rule Based Bot"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The inputs are the same compact signals we’ll use at runtime: paddle and ball positions, their velocities, a predicted intercept and its delta to the paddle, rough timing/speed hints, plus a couple of direction signs. Everything is normalized to [-1, +1] and stored as int8. Each example carries a single-byte label - &lt;em&gt;UP&lt;/em&gt;, &lt;em&gt;HOLD&lt;/em&gt;, or &lt;em&gt;DOWN&lt;/em&gt; - so one sample is only a handful of bytes.&lt;/p&gt;

&lt;p&gt;Because deployment uses 4-bit weights and 8-bit activations. We train with quantization in the loop: parameters are discrete, activations are clamped, and each layer can apply a small right-shift to keep values in range. This avoids the classic trap of "looks great in float32, collapses after quantization."&lt;/p&gt;

&lt;p&gt;Optimization stays deliberately simple: &lt;a href="https://www.geeksforgeeks.org/artificial-intelligence/introduction-hill-climbing-artificial-intelligence/" rel="noopener noreferrer"&gt;hill climbing&lt;/a&gt;. Start with small, varied integers; nudge one weight by ±1 (and occasionally a bias); score against the teacher; keep the change if accuracy doesn’t get worse. With only a few hundred parameters, that’s enough - and it matches the discrete space we actually ship.&lt;/p&gt;

&lt;p&gt;What do we watch while training? Accuracy, obviously, but also saturation. If too many activations are pegged at the rails, we bump a layer’s right-shift by one bit or trim fan-in. We also do short rollouts against the teacher to catch late reactions, camping, or oscillation. When accuracy plateaus and behavior looks clean, we serialize the tiny parameters, log the byte counts, and confirm we’re still under 64 KB.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Alternate Path for Comparison&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For contrast, we also built a no-limits version in &lt;a href="https://pytorch.org/" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt;, using &lt;a href="https://developer.nvidia.com/cuda-toolkit" rel="noopener noreferrer"&gt;CUDA&lt;/a&gt; when it’s available. The network is straightforward -12 inputs, two hidden layers of 128 and 64 with &lt;a href="https://www.geeksforgeeks.org/deep-learning/relu-activation-function-in-deep-learning/" rel="noopener noreferrer"&gt;ReLU&lt;/a&gt;, and 3 outputs for &lt;em&gt;UP&lt;/em&gt;, &lt;em&gt;HOLD&lt;/em&gt;, &lt;em&gt;DOWN&lt;/em&gt; - so: &lt;br&gt;
&lt;strong&gt;[12] → [128] → [64] → [3]&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3qli3a5p3sd1vrbv9vg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3qli3a5p3sd1vrbv9vg.png" alt="NN Structure for PyTorch Model" width="800" height="691"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It trains against the same rule-based bot, sees the same normalized features, and makes decisions by taking the &lt;a href="https://machinelearningmastery.com/argmax-in-machine-learning/" rel="noopener noreferrer"&gt;argmax&lt;/a&gt; of its logits. No quantization here; it’s float all the way.&lt;/p&gt;

&lt;p&gt;There’s also a distillation option: train the tiny integer model using the big PyTorch model as the teacher instead of the rule-based one. That gives us an apples-to-apples comparison and a clean way to see what extra capacity buys—and what careful quantization can keep.&lt;/p&gt;

&lt;p&gt;The alternate path has been created using &lt;strong&gt;AI assistance&lt;/strong&gt; and manual review later.&lt;br&gt;
Network architecture has been &lt;strong&gt;suggested by AI&lt;/strong&gt;, and then negotiated to the agreed minimum.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Visualizer and CLI Player&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We test the model in a small, deterministic arena: logic lives in [0, 1], rendering goes to pixels, the model plays on the right, and the rule-based bot plays on the left. Each frame builds the same 12-feature vector used in training, queries the model for three logits, turns that into an action, updates both paddles at a fixed speed, steps the ball with clean top/bottom bounces, checks paddle hits at their x-lines, and nudges ball speed slightly after successful returns (capped so rallies stay readable). A miss updates the score and triggers a fresh serve.&lt;/p&gt;

&lt;p&gt;Controls exist to help us observe, not to get in the way: pause, slow-motion, and a quick reset to reproduce openings. A small overlay shows actions, ball and paddle positions/velocities, the model’s integer logits, FPS, and slow-mo status. The loop stays flat and predictable - two decisions, one physics step, one draw.&lt;/p&gt;

&lt;p&gt;The model bundle loads at startup. Randomness is seeded so interesting rallies are reproducible, and long runs can stop after a target score for clean comparisons. The UI is intentionally minimal - light &lt;strong&gt;AI-assisted&lt;/strong&gt; scaffolding, fully reviewed - so the focus stays on how the tiny net thinks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl19g4eb9jd24nkojt6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl19g4eb9jd24nkojt6c.png" alt="Screenshot of Game Visualizer" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The visualizer can pit the tiny model or the no-limits model against the bot—or against each other. For longer experiments, a CLI mode runs series of games up to a chosen point total and reports rally lengths and basic match statistics.&lt;/p&gt;

&lt;p&gt;Also each model has its own game visualizer (historically) with same UI but limited to the particular model.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Project&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Project is available at GitHub: &lt;a href="https://github.com/rghafadaryan/neuro-pong" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tiny model (&lt;a href="https://github.com/rghafadaryan/neuro-pong/tree/main/tiny" rel="noopener noreferrer"&gt;tiny/&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/tiny/tiny_nn.py" rel="noopener noreferrer"&gt;tiny_nn.py&lt;/a&gt; - the compact MLP and its quantization routines.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/tiny/tiny_train_vs_rl.py" rel="noopener noreferrer"&gt;tiny_trainer_vs_rl.py&lt;/a&gt; - trains the tiny model by imitating the rule-based bot.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/tiny/tiny_train_vs_torch.py" rel="noopener noreferrer"&gt;tiny_trainer_vs_torch.py&lt;/a&gt; - optional: distill the tiny model from the PyTorch teacher.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/tiny/tiny_game.py" rel="noopener noreferrer"&gt;tiny_game.py&lt;/a&gt; - real-time visualizer for the tiny model vs. the rule-based bot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PyTorch no-limits model (&lt;a href="https://github.com/rghafadaryan/neuro-pong/tree/main/torch_based" rel="noopener noreferrer"&gt;torch_based/&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/torch_based/torch_pong_model.py" rel="noopener noreferrer"&gt;torch_pong_model.py&lt;/a&gt; - PyTorch MLP implementation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/torch_based/torch_pong_model.py" rel="noopener noreferrer"&gt;torch_based_trainer.py&lt;/a&gt; - trains the PyTorch model against the same rule-based bot.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/torch_based/torch_pong_game.py" rel="noopener noreferrer"&gt;https://github.com/rghafadaryan/neuro-pong/blob/main/torch_based/torch_pong_game.py&lt;/a&gt; - visualizer for the PyTorch model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Top-level utilities&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/versus_game.py" rel="noopener noreferrer"&gt;versus_game.py&lt;/a&gt; - pits any two models (tiny or PyTorch) against each other or the rule-based bot.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rghafadaryan/neuro-pong/blob/main/versus_cli.py" rel="noopener noreferrer"&gt;versus_game_cli.py&lt;/a&gt; - CLI runner for series of games (no UI); outputs rally lengths and match stats.
All scripts expose an exhaustive set of command-line options via --help.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;The source code is &lt;strong&gt;free to download and use&lt;/strong&gt;. This is an active work in progress and is provided as is, without warranties; &lt;strong&gt;use at your own risk&lt;/strong&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We trained two players on the same normalized 12-feature inputs and the same rule-based teacher: a no-limits PyTorch model (CUDA if available) and the tiny quantized model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;PyTorch model&lt;/em&gt;: 100,000 samples · 8 epochs&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Tiny model&lt;/em&gt;: 12,000 hill-climb iterations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Memory Budget&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Model bytes&lt;/th&gt;
&lt;th&gt;Features bytes&lt;/th&gt;
&lt;th&gt;Labels bytes&lt;/th&gt;
&lt;th&gt;Total bytes&lt;/th&gt;
&lt;th&gt;Approx size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PyTorch (no-limits)&lt;/td&gt;
&lt;td&gt;10,499&lt;/td&gt;
&lt;td&gt;41,996&lt;/td&gt;
&lt;td&gt;5,760,000&lt;/td&gt;
&lt;td&gt;120,000&lt;/td&gt;
&lt;td&gt;5,921,996&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ 5.65 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiny (4-bit / int8)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td&gt;36,864&lt;/td&gt;
&lt;td&gt;3,072&lt;/td&gt;
&lt;td&gt;40,077&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ 39.14 KB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Game Results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;100 games have been played, the winner must win 3 balls to win the game.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Matchup&lt;/th&gt;
&lt;th&gt;LEFT&lt;/th&gt;
&lt;th&gt;RIGHT&lt;/th&gt;
&lt;th&gt;LEFT wins %&lt;/th&gt;
&lt;th&gt;GAMES&lt;/th&gt;
&lt;th&gt;Rally min&lt;/th&gt;
&lt;th&gt;Rally avg&lt;/th&gt;
&lt;th&gt;Rally max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;torch-based vs rule-based&lt;/td&gt;
&lt;td&gt;torch-based&lt;/td&gt;
&lt;td&gt;rule-based&lt;/td&gt;
&lt;td&gt;68.0%&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;458.04&lt;/td&gt;
&lt;td&gt;2269&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tiny (trained on rule-based) vs rule-based&lt;/td&gt;
&lt;td&gt;tiny&lt;/td&gt;
&lt;td&gt;rule-based&lt;/td&gt;
&lt;td&gt;43.0%&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;390.14&lt;/td&gt;
&lt;td&gt;2267&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tiny vs torch-based&lt;/td&gt;
&lt;td&gt;tiny&lt;/td&gt;
&lt;td&gt;torch-based&lt;/td&gt;
&lt;td&gt;13.0%&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;1088.24&lt;/td&gt;
&lt;td&gt;9127&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Against the rule-based baseline, both learners perform competitively; the PyTorch model wins more often, but the tiny model isn’t far off and even edges some runs depending on seeds and lengths.&lt;/li&gt;
&lt;li&gt;Head-to-head, the PyTorch model clearly outplays the tiny model—no surprise given its capacity and float precision.&lt;/li&gt;
&lt;li&gt;Long rallies show there are no one-shot games, and the tiny network can hold its ground for a while.&lt;/li&gt;
&lt;li&gt;The tiny pipeline still delivers playable, stable behavior inside a ~39 KB bundle, which was the primary goal. Results will shift with game length, sampling, and training settings, so there’s room to tune and explore.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You don’t need a datacenter to teach a machine a good habit. A tiny, quantized network - with a few hundred bytes of parameters and a few tens of kilobytes of data - can learn a useful policy and hold its own against a solid rule-based player. The big PyTorch model wins, of course, but the small one shows up, plays real rallies, and does it inside a 39 KB envelope.&lt;/p&gt;

&lt;p&gt;Why it matters: the world is full of little devices that don’t want a cloud—sensors, toys, tools, quiet boxes on factory floors. They need models that wake up fast, think locally, and sip power. This project shows those models aren’t just possible - they’re practical.&lt;/p&gt;

&lt;p&gt;With the right constraints, small models stay focused: just enough to do the job, nothing more. This 64 KB challenge is a spark for further work on tiny, task-specific neural nets.&lt;/p&gt;

</description>
      <category>python</category>
      <category>neural</category>
      <category>pytorch</category>
      <category>iot</category>
    </item>
  </channel>
</rss>
