<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Stat Phantom</title>
    <description>The latest articles on Forem by Stat Phantom (@stat_phantom).</description>
    <link>https://forem.com/stat_phantom</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3896851%2F7aa0dd7a-fa35-4366-843f-b692329de6cf.png</url>
      <title>Forem: Stat Phantom</title>
      <link>https://forem.com/stat_phantom</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/stat_phantom"/>
    <language>en</language>
    <item>
      <title>2 Lines of Code Saved 6.4x Memory on My Snake AI</title>
      <dc:creator>Stat Phantom</dc:creator>
      <pubDate>Fri, 01 May 2026 06:36:43 +0000</pubDate>
      <link>https://forem.com/stat_phantom/2-lines-of-code-saved-64-memory-on-my-snake-ai-3dhh</link>
      <guid>https://forem.com/stat_phantom/2-lines-of-code-saved-64-memory-on-my-snake-ai-3dhh</guid>
      <description>&lt;p&gt;Greetings all! In my &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p"&gt;previous post&lt;/a&gt; I covered Binary Plane Encoding, a 3-channel grid representation for Snake that doubled the best published score. Three binary channels: head, body, apple. For details check my previous post.&lt;/p&gt;

&lt;p&gt;But there was a fourth channel I left out. Direction. The snake's current heading, encoded as a uint8 (0 = up, 1 = right, 2 = down, 3 = left), is painted uniformly across a 20×20 plane due to matrix shape requirements. That's 400 elements carrying exactly 2 bits of information. A 1,600× overhead at the channel level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb8ay2ai377xpuazjvy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb8ay2ai377xpuazjvy5.png" alt="Grid of all 2's" width="523" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Worse, that one integer channel with its 2 bits was blocking the entire state from being bit-packed. The other three grid channels are binary, meaning they &lt;em&gt;could&lt;/em&gt; be packed at 1 bit per element. But the direction channel with its &lt;em&gt;scoffs&lt;/em&gt; 2 bits, can't. So the replay buffer sees the state as uint8 instead of binary. One channel, 2 bits, holding back one more step of memory optimisation, forcing 1,600 bytes per state instead of 250 (20 × 20 grid, ×4 channels, 1 byte per channel = 1,600 vs 20 × 20 grid, ×5 channels, 1 bit per element / 8 = 250).&lt;/p&gt;

&lt;p&gt;This follow-up post is about fixing that, and the pitfalls along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Attempt
&lt;/h2&gt;

&lt;p&gt;Four cardinal directions. Two bits encode four states. So the intuitive replacement is two binary channels instead of one integer channel: one bit for North/South, one bit for East/West. Compact, geometric, obvious.&lt;/p&gt;

&lt;p&gt;Except it doesn't work. Walk through it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwr0amhykrkvdoqoqsdq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwr0amhykrkvdoqoqsdq4.png" alt="NESW Diagram" width="800" height="727"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;North and West both map to 0,0 - &lt;strong&gt;Collision&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The failure is subtle because the scheme &lt;em&gt;seems&lt;/em&gt; right. Four directions, four possible bit combinations, should be a clean fit. But the scheme tries to answer "is there a north/south component?" and "is there an east/west component?" Cardinal movement is strictly one-dimensional. The perpendicular component is always exactly zero. What does the E/W bit say when the snake is moving north? It's not moving east. It's also not moving west. Both map to 0. "Not moving east" is identical to "not moving west" in a single bit.&lt;/p&gt;

&lt;p&gt;Two bits should be enough for four directions. They are. Just not &lt;em&gt;those&lt;/em&gt; two bits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ask Better Questions
&lt;/h2&gt;

&lt;p&gt;The collision happens because the N/S + E/W scheme asks the wrong questions for cardinal movement. The fix isn't more bits. It's better questions.&lt;/p&gt;

&lt;p&gt;The correct encoding uses two bits derived geometrically:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Axis bit:&lt;/strong&gt; which axis is the snake travelling along? (0 = vertical, 1 = horizontal)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sign bit:&lt;/strong&gt; which direction on that axis? (0 = negative, 1 = positive)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pkvnvgzxo3b1epiui0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pkvnvgzxo3b1epiui0a.png" alt="NESW Fixed" width="800" height="733"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All four directions get unique codes. The axis bit answers "which axis?" and the sign bit answers "which end?" Both questions always have exactly one answer for cardinal movement. No ambiguity, no collisions. The specific sign convention (whether north is positive or negative) doesn't matter as long as it's internally consistent. The CNN will learn whatever mapping you give it.&lt;/p&gt;

&lt;p&gt;The first attempt was asking the wrong questions. Once you ask the right ones, two bits is plenty.&lt;/p&gt;

&lt;p&gt;For anyone wondering about diagonal games (8 directions), the axis + sign scheme breaks because a diagonal is on both axes simultaneously. The general solution there is a 4-channel one-hot: one binary plane per cardinal direction, with two planes active for a diagonal. But for Snake, cardinal-only, the 2-channel scheme is the right choice. Don't build the generality you don't need.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory Maths
&lt;/h2&gt;

&lt;p&gt;This is where the change pays off. The state goes from &lt;code&gt;(4, 20, 20)&lt;/code&gt; with one integer channel to &lt;code&gt;(5, 20, 20)&lt;/code&gt; with all binary channels. Yes, adding a channel saves memory. That sounds backwards but the maths checks out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (4-channel, uint8 storage):&lt;/strong&gt; 4 × 20 × 20 = 1,600 elements at 1 byte each = 1,600 bytes per state. A 1-million-transition replay buffer (storing both state and next state): &lt;strong&gt;3.2 GB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After (5-channel, binary bit-packed):&lt;/strong&gt; 5 × 20 × 20 = 2,000 elements. Every value is now 0 or 1, so each element can be packed at 1 bit, 8 elements per byte. ⌈2,000 / 8⌉ = 250 bytes per state. The same buffer: &lt;strong&gt;500 MB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6.4× reduction.&lt;/strong&gt; Adding one channel, removing 2.7 GB.&lt;/p&gt;

&lt;p&gt;To put this in perspective: the grid encoding stored naively as float32 (before any compression) would be 6,400 bytes per state, or 12.8 GB for a 1M-transition buffer. The first post's uint8 storage cut that to 3.2 GB (4× reduction). This post's binary bit-packing cuts it again to 500 MB. Across both changes, that's a &lt;strong&gt;25.6× total reduction&lt;/strong&gt; from the uncompressed float32 starting point.&lt;/p&gt;

&lt;p&gt;And compared to the pixel-based approaches from the first post? Wei et al.'s RGB inputs would need approximately 49 GB for the same buffer. Binary Plane Encoding with binary cardinal directions brings that to 500 MB. Nearly a &lt;strong&gt;98× difference&lt;/strong&gt;. A 1-million-transition replay buffer now fits comfortably in the VRAM of a gaming laptop, hell, it fits in some EPYC CPU caches (AMD's Genoa-X packs up to 1,152 MB of L3). With pixel inputs, it wouldn't fit on most workstations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Lines of Code
&lt;/h2&gt;

&lt;p&gt;The implementation change is in &lt;code&gt;snake_cnn_env.py&lt;/code&gt;. Replace the single integer direction plane with two binary planes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: one integer channel
# grid[3] = self._direction  # 0, 1, 2, or 3
&lt;/span&gt;
  &lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_direction&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# axis: 0=vertical, 1=horizontal
&lt;/span&gt;  &lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_direction&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# sign: 0=negative, 1=positive
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Update &lt;code&gt;input_channels&lt;/code&gt; from 4 to 5 in the model config. Done. We now store 5 channels instead of 4, but each channel is 1 bit instead of 8. One extra channel, massively less storage.&lt;/p&gt;

&lt;p&gt;One real cost: changing &lt;code&gt;input_channels&lt;/code&gt; changes the shape of the first convolutional weight tensor. Existing checkpoints can't be loaded into a 5-channel model. This requires a fresh training run, so schedule the change at a natural break point, not mid-experiment.&lt;/p&gt;
&lt;h2&gt;
  
  
  torch.unpackbits Doesn't Exist
&lt;/h2&gt;

&lt;p&gt;The CPU side of bit-packing is trivial. &lt;code&gt;np.packbits&lt;/code&gt; and &lt;code&gt;np.unpackbits&lt;/code&gt; have existed in NumPy since 2010. Pack on write, unpack on read. Done.&lt;/p&gt;

&lt;p&gt;So just implement it on the GPU side right? WRONG. The natural PyTorch equivalent would be &lt;code&gt;torch.unpackbits&lt;/code&gt;, which... doesn't exist? The function is absent from the stable API entirely, and importing it raises an &lt;code&gt;AttributeError&lt;/code&gt;. This is a genuine gap in PyTorch that anyone implementing binary storage on CUDA will hit.&lt;/p&gt;

&lt;p&gt;The community workaround I found uses bitmasks:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;unpacked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unsqueeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;flip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This works. It preserves the original bit values, converts them to binary via &lt;code&gt;.bool().int()&lt;/code&gt;, and flips the bit order to match MSB-first convention. Four operations, correct output.&lt;/p&gt;

&lt;p&gt;But I don't need to preserve the original mask values, I just need 0s and 1s. I thought I could do better, and I wouldn't be a programmer if I didn't try for no other reason except... &lt;em&gt;shrugs&lt;/em&gt; I wanted to?&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;shifts&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;packed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;unpacked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;packed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unsqueeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;shifts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# (B, packed_size, 8)
&lt;/span&gt;&lt;span class="n"&gt;unpacked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unpacked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;n_elems&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="c1"&gt;# drop padding bits
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each packed byte is broadcast against 8 shift values &lt;code&gt;[7, 6, 5, 4, 3, 2, 1, 0]&lt;/code&gt;, right-shifting to move each successive bit into the least significant position. Bitwise &amp;amp; with 1 isolates it. Two operations instead of four. No &lt;code&gt;.bool().int()&lt;/code&gt; needed because &lt;code&gt;&amp;gt;&amp;gt; shift &amp;amp; 1&lt;/code&gt; always yields binary output directly. No &lt;code&gt;.flip()&lt;/code&gt; needed because the descending shift range already produces MSB-first order. Fewer intermediate tensors in VRAM during sampling.&lt;/p&gt;

&lt;p&gt;The mask approach also has a shape bug: it's written for a 1D input (flat array of bytes) and breaks on a batched 2D input &lt;code&gt;(B, packed_size)&lt;/code&gt;. The shift approach handles batched GPU sampling correctly from the start.&lt;/p&gt;

&lt;p&gt;Both are fully device-resident with no CPU-GPU transfer. But two operations beats four, and not allocating intermediate tensors matters when batch size and state shape are large. Will reducing two ops make a difference? Probably not, but I saw the OPportunity and took it. And yes, I said that just for the joke.&lt;/p&gt;

&lt;p&gt;So, two lines of code changed the state representation to allow bit-packing and saved a lot of storage with no loss of data.&lt;/p&gt;
&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This is part of an ongoing series building Rainbow DQN incrementally and measuring each component on Snake. The state representation work runs in parallel to the algorithm comparison. It doesn't change which Rainbow components help or hurt, but a 6.4× memory reduction means larger buffers, more parallel environments, or training on hardware that previously couldn't fit the buffer.&lt;/p&gt;

&lt;p&gt;The algorithm results are the next post.&lt;/p&gt;

&lt;p&gt;If you've hit the &lt;code&gt;torch.unpackbits&lt;/code&gt; gap yourself, or found a cleaner solution than bitwise shifts for GPU-side bit unpacking, I'd like to hear about it in the comments.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;This work is part of ongoing research and the findings are planned to be submitted as a peer-reviewed paper.&lt;/em&gt;&lt;/p&gt;



&lt;p&gt;If you missed the first post in this series:&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p" class="crayons-story__hidden-navigation-link"&gt;A CNN Grid Encoding for Snake AI That DOUBLES! the Best Published Score&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/stat_phantom" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3896851%2F7aa0dd7a-fa35-4366-843f-b692329de6cf.png" alt="stat_phantom profile" class="crayons-avatar__image" width="800" height="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/stat_phantom" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Stat Phantom
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Stat Phantom
                
              
              &lt;div id="story-author-preview-content-3548283" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/stat_phantom" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3896851%2F7aa0dd7a-fa35-4366-843f-b692329de6cf.png" class="crayons-avatar__image" alt="" width="800" height="800"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Stat Phantom&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p" id="article-link-3548283"&gt;
          A CNN Grid Encoding for Snake AI That DOUBLES! the Best Published Score
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/machinelearning"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;machinelearning&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/deeplearning"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;deeplearning&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/cnn"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;cnn&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;5&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              2&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;


</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>A CNN Grid Encoding for Snake AI That DOUBLES! the Best Published Score</title>
      <dc:creator>Stat Phantom</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:39:23 +0000</pubDate>
      <link>https://forem.com/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p</link>
      <guid>https://forem.com/stat_phantom/a-cnn-grid-encoding-for-snake-ai-that-doubles-the-best-published-score-245p</guid>
      <description>&lt;p&gt;A traditional Snake game grid has only 4 states each grid point can be in: empty, head, body, or apple. And for some reason every published Snake AI paper either throws away spatial information by condensing the game state into a handful of hand-picked numbers, or buries entity identity under layers of raw pixel data that the network has to untangle. Incredibly wasteful.&lt;/p&gt;

&lt;p&gt;The solution? Binary Plane Encoding. Using it, a CNN-based model reached a record score of 125 on a 20×20 grid in 2.5 hours on a single RTX 2070, doubling the best published result of 62 (even the average is consistently above this record). This post explains the encoding, why it works, and explores why nobody in the Snake DRL space has tried it before.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Camps
&lt;/h2&gt;

&lt;p&gt;The published literature on deep reinforcement learning for Snake spans 2018 to 2025 and splits into two approaches to state representation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camp one: hand-crafted feature vectors.&lt;/strong&gt; Sebastianelli et al. (2021) and Kommalapati et al. (2025) both use 11 binary features fed to a fully-connected network. Three danger flags (is there a wall or body segment directly ahead, to the left, to the right), four direction flags (which way is the snake currently heading), and four food-relative flags (is the apple above, below, left, right of the head). The network receives a pre-digested summary of the game state. It never sees the grid. It never learns spatial relationships. A human decided what matters and encoded that decision directly into the input.&lt;/p&gt;

&lt;p&gt;This works well. Sebastianelli achieved a best score of 62 on a 20×20 grid with vanilla DQN and this 11-feature representation, and uses very little resources... at least initially, but then a hard ceiling is quickly reached. The network cannot discover and learn spatial patterns because it never sees the spatial layout. And the features themselves are Snake-specific. Those 11 binary values encode what a Snake expert thinks matters. They would be meaningless for any other game. If you want an agent that can generalise beyond a single environment, this is a dead end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camp two: raw pixels.&lt;/strong&gt; Wei et al. (2018) and Tushar &amp;amp; Siddique (2022) both train from screenshots. Wei uses 64×64 RGB frames stacked four deep, giving 64×64×12 input. Tushar converts to binary (any non-zero pixel becomes 1) at 84×84, also four frames stacked, giving 84×84×4.&lt;/p&gt;

&lt;p&gt;The pixel approach is game-agnostic, which is its strength. But the cost is significant. Tushar's binary encoding collapses head, body, and apple into a single value. In any individual frame, every occupied cell looks identical. The agent can only figure out what's what by watching how things move across four stacked frames: food stays still, the snake moves. A single frame on its own contains zero identity information. Wei's RGB encoding preserves colour and therefore identity, but at the cost of massive input dimensionality and redundant spatial resolution (64×64 pixels to represent a 20×20 logical grid).&lt;/p&gt;

&lt;p&gt;Both pixel approaches were tested on 12×12 grids, reaching best scores of 17 (Wei) and 20 (Tushar). Neither has been applied to 20×20.&lt;/p&gt;

&lt;p&gt;Beyond the peer-reviewed literature, informal projects show similar patterns. A supervised learning approach on GitHub (Huynh, 2020) uses 7 hand-crafted features with a Keras network and reaches a best of 46, average 22 on 20×20. A Medium article (Schoberg, 2020) compares deterministic algorithms rather than learned policies, reaching 67 on 20×20 with a collision-avoiding shortest-path algorithm (no neural network involved at all).&lt;/p&gt;

&lt;p&gt;Across all of it, every neural network approach uses either compressed feature vectors or raw pixel grids.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap
&lt;/h2&gt;

&lt;p&gt;Here is the part that surprised me. Multi-channel grid encoding is not a new idea. It is the standard state representation in board game AI.&lt;/p&gt;

&lt;p&gt;AlphaZero (Silver et al., 2018) represents chess, Go, and Shogi as multi-channel binary planes. Each piece type, colour, and game-state feature gets its own channel. The network receives a spatial tensor where every channel encodes a different semantic category of information about the board. MuZero extends this. The representation is well-established, well-understood, and has been proven at the highest levels of game AI.&lt;/p&gt;

&lt;p&gt;Snake fundamentally runs on a grid with set positions entities can occupy. It mirrors the exact class of problem where channel-per-entity encoding has proven effective, yet no published Snake DRL paper, and no self-published project I have found, attempts this representation. (Although this not appearing in published papers isn't surprising to me. As someone who this month had to go through over 2,100 papers, most papers just follow pre-existing trends.)&lt;/p&gt;

&lt;p&gt;All of the pre-existing Snake DRL literature either pre-computes features and discards spatial representation, or captures raw pixels and forces the network to spend capacity on visual processing before it can even begin to learn the game.&lt;/p&gt;

&lt;p&gt;This is the gap. Not a novel encoding technique, but an established one applied to a domain that has ignored it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Encoding
&lt;/h2&gt;

&lt;p&gt;The state representation is a 20×20×3 binary tensor. Three channels, each covering the full grid:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Channel 0 (head):&lt;/strong&gt; 1 at the head position, 0 everywhere else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Channel 1 (body):&lt;/strong&gt; 1 at each body segment position, 0 elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Channel 2 (apple):&lt;/strong&gt; 1 at the apple position, 0 everywhere else.&lt;/p&gt;

&lt;p&gt;Every value is exactly 0 or 1. A single frame provides complete, unambiguous game state. What is the head, where is the body, where is the food. No temporal stacking required. No entity disambiguation through motion inference. No feature engineering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9aay7yiqtp8qa9vzz7r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9aay7yiqtp8qa9vzz7r.png" alt="Visual Representation of Encoding Layers" width="567" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The construction from game state is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;encode_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head_pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body_positions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;apple_pos&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grid_size&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Channel 0: head
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;head_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Channel 1: body
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;body_positions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Channel 2: apple
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;apple_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;apple_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That produces 20×20×3 = 1,200 values per state. Compare that to the pixel approaches: Tushar's binary encoding produces 84×84×4 = 28,224 values (23× larger), and Wei's RGB produces 64×64×12 = 49,152 values (41× larger). The grid encoding captures strictly more semantic information in a fraction of the space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjsxg3bak0moekn59c3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjsxg3bak0moekn59c3u.png" alt="Memory Usage Comparison" width="713" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The information hierarchy makes this concrete:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Entity identity per frame&lt;/th&gt;
&lt;th&gt;Full spatial layout&lt;/th&gt;
&lt;th&gt;Game-agnostic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Binary Plane Encoding (this model)&lt;/td&gt;
&lt;td&gt;Yes, perfect&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial (any grid game)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RGB pixels (Wei et al.)&lt;/td&gt;
&lt;td&gt;Yes, via colour&lt;/td&gt;
&lt;td&gt;Approximate&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary pixels (Tushar)&lt;/td&gt;
&lt;td&gt;No (needs 4 frames)&lt;/td&gt;
&lt;td&gt;Approximate&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature vectors (Sebastianelli)&lt;/td&gt;
&lt;td&gt;Yes, pre-computed&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No (Snake-specific)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The only representation in the reviewed literature that provides perfect entity identity, full spatial layout, and game-agnostic structure without additional processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CNN Architecture
&lt;/h2&gt;

&lt;p&gt;The model processing this encoding is deliberately compact:&lt;/p&gt;

&lt;p&gt;Two convolutional layers with 32 and 64 channels respectively, 3×3 kernels with same padding, followed by a single MaxPool2d that halves the spatial dimensions from 20×20 to 10×10. Two dense layers of 512 and 256 units. Mish activation throughout.&lt;/p&gt;

&lt;p&gt;The network also uses a dueling architecture (separate value and advantage streams) and NoisyLinear layers replacing standard linear layers in the fully-connected head, providing learned exploration noise instead of epsilon-greedy.&lt;/p&gt;

&lt;p&gt;This is not a large network. It doesn't need to be. The compact input representation means the convolutional backbone doesn't need depth. Two 3×3 layers with a single pooling stage are sufficient to capture the spatial relationships that matter in a 20×20 grid: proximity to walls, body segment density in nearby regions, and relative food position. The encoding has already done the hard work of structuring the information. The CNN just needs to read it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous Records
&lt;/h2&gt;

&lt;p&gt;The meaningful comparisons are grouped by grid size, since raw scores are not directly comparable across different board dimensions.&lt;/p&gt;

&lt;h3&gt;
  
  
  20×20 Grid
&lt;/h3&gt;

&lt;p&gt;The only published peer-reviewed result on a 20×20 Snake grid is Sebastianelli et al. (2021). They used an MLP with 11 hand-crafted binary features and vanilla DQN, testing 13 hyperparameter configurations across evaluation runs. Their best single score was &lt;strong&gt;62&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This work, using Binary Plane Encoding with a CNN and Rainbow DQN (incorporating C51 distributional output, dueling architecture, noisy exploration, prioritised replay, and 3-step returns), achieved a record of &lt;strong&gt;125&lt;/strong&gt; on the same grid. over double.&lt;/p&gt;

&lt;p&gt;This isn't a cherry-picked peak. Across 55,000 episodes of sustained training, the rolling average holds between 60 and 70, and the median between 64 and 74. Sebastianelli's best single game of 62 sits below this model's average. The p10 floor (the score that 90% of episodes exceed) holds around 30, meaning even the worst games routinely outperform most published baselines. The p90 reaches into the high 90s, with individual episodes regularly breaking 100. Training to this point took approximately 2.5 hours on a single RTX 2070.&lt;/p&gt;

&lt;p&gt;An important caveat: this is not an encoding-only comparison. The improvement comes from changes across multiple axes simultaneously. State representation (grid encoding vs feature vector), architecture (CNN vs MLP), algorithm (Rainbow DQN vs vanilla DQN), and training scale (2048 parallel environments vs a smaller setup). The encoding is the enabling change that made the architecture and training scale feasible on consumer hardware, but the doubling should not be attributed to the encoding alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  12×12 Grid
&lt;/h3&gt;

&lt;p&gt;Direct score comparison across grid sizes doesn't work because a 12×12 grid has a maximum possible score of approximately 141 food items versus approximately 399 for 20×20. Board coverage (score divided by maximum possible) provides a normalised metric:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Work&lt;/th&gt;
&lt;th&gt;Grid&lt;/th&gt;
&lt;th&gt;Best Score&lt;/th&gt;
&lt;th&gt;Board Coverage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Wei et al. (2018)&lt;/td&gt;
&lt;td&gt;12×12&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;~12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tushar &amp;amp; Siddique (2022)&lt;/td&gt;
&lt;td&gt;12×12&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;~14%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sebastianelli et al. (2021)&lt;/td&gt;
&lt;td&gt;20×20&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;~16%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;This model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20×20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;125&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~31%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap persists across normalisation. At 31% board coverage, this approach covers roughly double the grid fraction of the nearest published result and more than double the pixel-based CNN approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Informal results (not peer-reviewed)
&lt;/h3&gt;

&lt;p&gt;For completeness: a supervised learning project (Huynh, 2020) on 20×20 achieved a best of 46, and a deterministic shortest-path algorithm (Schoberg, 2020) reached 67 on 20×20. The latter is not a learned policy. Neither is peer-reviewed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Works
&lt;/h2&gt;

&lt;p&gt;The encoding's advantage operates on two levels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Information quality.&lt;/strong&gt; The network receives exactly the information it needs to play Snake, in a spatial format that CNNs are designed to process, with zero noise or redundancy. Each channel answers one question: where is the head, where is the body, where is the food. There is no ambiguity to resolve, no motion to infer, no irrelevant visual detail to filter out.&lt;/p&gt;

&lt;p&gt;Pixel inputs have a problem where the network must first learn to segment the image (such as determining what's the snake's body and what's the background). After this it then needs to learn to interpret the spatial relationships between the segments. With Binary Plane Encoding, this segmentation is pre-constructed, leaving the network to devote its entire capacity to learning the actual game instead of learning how to see in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Information density.&lt;/strong&gt; At 1,200 values per state stored as uint8, a replay buffer holding 1,000,000 transitions fits comfortably in approximately 1.6GB of VRAM. This made a GPU-resident replay buffer and 2048 parallel environments possible on a single RTX 2070 with 8GB of VRAM.&lt;/p&gt;

&lt;p&gt;For comparison, storing Tushar's 84×84×4 binary inputs at the same buffer capacity would need approximately 28GB. Wei's 64×64×12 RGB inputs would need approximately 49GB. Neither fits on consumer hardware. You would need multiple high-end GPUs or cloud infrastructure to achieve the same training scale with pixel-based inputs.&lt;/p&gt;

&lt;p&gt;The compact encoding didn't just improve information quality. It made the training infrastructure possible. 2048 parallel environments with a GPU-resident buffer meant the replay buffer reached useful diversity faster, the distributional RL gradient signal had richer data to work with, and the agent surpassed all previous records before reaching 100,000 training episodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Caveats
&lt;/h2&gt;

&lt;p&gt;This encoding is a &lt;strong&gt;privileged state representation&lt;/strong&gt;. The agent receives information extracted directly from the game's internal data structures: exact head position, exact body segment positions, exact apple position. A human player has access to the same logical information through visual perception, but this agent receives it pre-structured without any perceptual processing.&lt;/p&gt;

&lt;p&gt;The model plateaued at 125 (over 50,000 simulations without it budging), but a subsequent run using a variant algorithm has already broken that record, so we know this isn't the ceiling for the encoding. The more interesting question is whether pixel-based approaches could ever reach these scores given enough compute. Theoretically yes, but whether it's achievable in practice is unknown. Imperfections in the visual pipeline may compound through training, but that hypothesis hasn't been tested and the performance cost of segmentation quality on Snake hasn't been quantified. Whether the gap is recoverable or structural is an open question and one worth testing properly. If you take this on, I'd love to see what you find.&lt;/p&gt;

&lt;p&gt;Cross-paper comparisons to Sebastianelli et al. and the pixel-based approaches should be read with the privileged state in mind. The improvement reflects the combined effect of encoding quality, architecture, algorithm, and training scale. Isolating each factor's individual contribution is the purpose of the ablation study this encoding supports.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Binary Plane Encoding is the foundation for a systematic ablation study on Rainbow DQN applied to Snake. The study adds one component at a time (Double DQN, noisy exploration, dueling architecture, prioritised experience replay, C51 distributional output), measuring each component's individual contribution in a dense-reward, vectorised-environment setting.&lt;/p&gt;

&lt;p&gt;Early results have already produced some surprises about which Rainbow components help and which ones hurt on a task like Snake. That is the next post.&lt;/p&gt;

&lt;p&gt;If you have experience with alternative state representations for grid-based game AI, or if you have seen Binary Plane Encoding applied to Snake in work I haven't found, I'd genuinely like to hear about it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This work is part of ongoing research and the findings are planned to be submitted as a peer-reviewed paper.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Peer-Reviewed
&lt;/h3&gt;

&lt;p&gt;Sebastianelli et al. (2021) - "A Deep Q-Learning based approach applied to the Snake game" - 29th Mediterranean Conference on Control and Automation (MED). &lt;a href="https://doi.org/10.1109/MED51440.2021.9480232" rel="noopener noreferrer"&gt;DOI: 10.1109/MED51440.2021.9480232&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kommalapati et al. (2025) - "Building an AI Snake Powered by Deep Reinforcement Learning and Deep Q-Learning" - IEEE 7th International Symposium on Advanced Electrical and Communication Technologies (ISAECT). &lt;a href="https://doi.org/10.1109/ISAECT68904.2025.11318716" rel="noopener noreferrer"&gt;DOI: 10.1109/ISAECT68904.2025.11318716&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wei et al. (2018) - "Autonomous Agents in Snake Game via Deep Reinforcement Learning" - IEEE International Conference on Agents (ICA), Singapore. &lt;a href="https://doi.org/10.1109/AGENTS.2018.8460004" rel="noopener noreferrer"&gt;DOI: 10.1109/AGENTS.2018.8460004&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tushar &amp;amp; Siddique (2022) - "A Memory Efficient Deep Reinforcement Learning Approach For Snake Game Autonomous Agents" - IEEE 16th International Conference on Application of Information and Communication Technologies (AICT). &lt;a href="https://doi.org/10.1109/AICT55583.2022.10013603" rel="noopener noreferrer"&gt;DOI: 10.1109/AICT55583.2022.10013603&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Silver et al. (2018) - "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" - Science 362, 1140-1144. &lt;a href="https://doi.org/10.1126/science.aar6404" rel="noopener noreferrer"&gt;DOI: 10.1126/science.aar6404&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Informal / Community Work
&lt;/h3&gt;

&lt;p&gt;Huynh (2020) - Supervised learning Snake AI. &lt;a href="https://github.com/TimHuynh0905/snake-ai" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Schoberg (2020) - Deterministic algorithms for Snake. &lt;a href="https://medium.com/analytics-vidhya/playing-snake-with-ai-2ea68f0e914a" rel="noopener noreferrer"&gt;Medium Article&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>deeplearning</category>
      <category>cnn</category>
    </item>
  </channel>
</rss>
