<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jason Peterson</title>
    <description>The latest articles on Forem by Jason Peterson (@jason_peterson_607e54abf5).</description>
    <link>https://forem.com/jason_peterson_607e54abf5</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3576555%2Fa9a16049-e112-4caa-a7a2-a21afbbbdde2.png</url>
      <title>Forem: Jason Peterson</title>
      <link>https://forem.com/jason_peterson_607e54abf5</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jason_peterson_607e54abf5"/>
    <language>en</language>
    <item>
      <title>I Built a 120-Image AI Influencer Pipeline for $4.80</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Fri, 13 Feb 2026 18:36:24 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/i-built-a-120-image-ai-influencer-pipeline-for-480-117p</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/i-built-a-120-image-ai-influencer-pipeline-for-480-117p</guid>
      <description>&lt;p&gt;&lt;em&gt;Erewhon Smoothie — Lilly and her Cavalier on a Brooklyn sidewalk. $0.04.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhoxn4mbwuaarxpvr7g0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhoxn4mbwuaarxpvr7g0.png" alt="Erewhon winner" width="800" height="1076"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Rimowa Classic Cabin — airport terminal, golden hour. The suitcase sits beside her like it belongs. $0.04.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyvcv1t1tqlu3zqxofh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyvcv1t1tqlu3zqxofh0.png" alt="Rimowa winner" width="800" height="1076"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hennessy VSOP — candlelit speakeasy, bottle on the table, dog on her lap. $0.04.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh8tmi5dvp27iva1v90e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh8tmi5dvp27iva1v90e.png" alt="Hennessy winner" width="800" height="1076"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Bang &amp;amp; Olufsen Beoplay H95 — Montauk beach at golden hour. The hardest shot in the set: only 2 of 30 attempts got the headphones right. $0.04.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2rgej2llv3vazxmd66s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2rgej2llv3vazxmd66s.png" alt="Bang &amp;amp; Olufsen winner" width="800" height="1076"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Four brand partnerships. 120 AI-generated photos. One fictional influencer, her dog, and a $4.80 fal.ai bill. This is the entire pipeline — what worked, what didn't, and why it matters.&lt;/p&gt;


&lt;h2&gt;
  
  
  From Tilly to Lilly
&lt;/h2&gt;

&lt;p&gt;Last September, Eline van der Velden announced at the Zurich Summit that her AI-generated actress "Tilly Norwood" was in talks with a talent agency. Emily Blunt, Melissa Barrera, and Whoopi Goldberg publicly condemned it. Van der Velden received death threats. The backlash proved something important: synthetic people are now consistent and believable enough to genuinely threaten livelihoods. But Tilly is an actress in controlled contexts. What about &lt;strong&gt;influencers&lt;/strong&gt; — where the content IS the product?&lt;/p&gt;

&lt;p&gt;Meet Lilly Sorghum. Late 20s, Afro-Caribbean heritage, effortlessly stylish, never without her Cavalier King Charles Spaniel. She has brand partnerships with Erewhon, Rimowa, Hennessy, and Bang &amp;amp; Olufsen. She doesn't exist.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;25 candidates, $0.63. Pick one, lock it, never regenerate her again.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpcomz7suz4fgf558j33p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpcomz7suz4fgf558j33p.png" alt="5x5 grid of 25 audition candidates" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I generated 25 candidates with Flux Dev ($0.63), picked one, and locked her as the reference anchor. Every shot that follows preserves her face, her dog, her vibe.&lt;/p&gt;

&lt;p&gt;The goal isn't to deceive — it's to make the pipeline transparent so you can see exactly how trivial this has become.&lt;/p&gt;


&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Fan out, fan in. 120 shots generated in parallel, 4 winners selected.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o5dt8ptkk9ht5tlj0vj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o5dt8ptkk9ht5tlj0vj.png" alt="Fan out, fan in. 120 shots generated in parallel, 4 winners selected." width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code is the orchestrator. I give it a creative brief and it spawns a swarm of sub-agents — one per product, each running in parallel, each spawning its own 30 shot agents. The lead agent writes the briefs, the product agents write the prompts, the shot agents call fal.ai, and the judge agents evaluate the results with vision. No Python threading, no job queue — just Claude Code talking to itself in parallel and writing the code to make the API calls.&lt;/p&gt;

&lt;p&gt;Each shot agent's core is about 10 lines of Python that Claude wrote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fal_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fal-ai/flux-pro/kontext/multi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_urls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ref_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_url&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aspect_ratio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3:4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pass &lt;a href="https://fal.ai" rel="noopener noreferrer"&gt;fal.ai&lt;/a&gt;'s Kontext Pro Multi two reference images — Lilly+dog and the product photo — describe the scene, get back an integrated shot. $0.04 each. 9 minutes wall clock for all 120.&lt;/p&gt;

&lt;p&gt;fal.ai made this project possible in a way that local inference couldn't. One API key, no GPU provisioning, no model downloads — and critically, their infrastructure handled 120 concurrent requests without breaking a sweat. The developer experience is remarkably clean: one &lt;code&gt;fal_client.subscribe()&lt;/code&gt; call per image, results back in seconds. When you're building a parallelized pipeline, that simplicity compounds.&lt;/p&gt;

&lt;p&gt;The star of the show is Kontext Multi's scene understanding. It doesn't just paste objects — it rotates a suitcase upright, places a bottle on a table, wraps headphones around a neck. All from flat product photos.&lt;/p&gt;

&lt;p&gt;A year ago, character consistency was the hard problem. You'd fine-tune a LoRA, train DreamBooth for hours, and still get drift by image 20. Now it's one reference image passed as an API parameter. Lilly is recognizably herself in all 120 shots — same face, same dog, different scenes, different outfits. Consistency is table stakes. Product integration is the new wild card:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;COHERENCE — All 3 Elements (Lilly + Dog + Product)
────────────────────────────────────────────────────
Rimowa (suitcase beside her):     19/30  63%
Hennessy (bottle on table):       16/30  53%
Erewhon (cup in hand):            11/30  37%
B&amp;amp;O (headphones around neck):      2/30   7%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern: &lt;strong&gt;objects that sit beside the subject&lt;/strong&gt; are easy. &lt;strong&gt;Held objects&lt;/strong&gt; are harder. &lt;strong&gt;Wearables&lt;/strong&gt; are nearly impossible — only 2 of 30 B&amp;amp;O shots got headphones right. The brief matters more than prompt engineering.&lt;/p&gt;

&lt;p&gt;The solve: generate many, pick few. 30 candidates at $0.04 each gives you enough even at 7%. That 30:1 ratio is how real creative production works — AI just makes it $1.20 instead of $15,000.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Human picks vs. AI vision judge — 4/4 agreement on winners.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7mi318al29ciyg5miad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7mi318al29ciyg5miad.png" alt="Winners grid" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After generating 120 images, I picked winners two ways: by hand, and by having Claude evaluate every image with vision, scoring on 8 criteria — character consistency, product visibility, composition, scroll-stop factor, and more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HUMAN vs. AI JUDGE — Winner Picks
──────────────────────────────────
Rimowa:     Human #4    AI #4    ✓
Erewhon:    Human #3    AI #3    ✓
Hennessy:   Human #20   AI #20   ✓
B&amp;amp;O:        Human #14   AI #14   ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4/4 agreement on winners. They diverged on runner-ups (2 out of 3 different) — the obvious best stands out, the second-best is subjective. This suggests the judging step could be fully automated. The entire pipeline — brief to finished post — could run unattended.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The failure modes: doubled products, extra dogs, ignored briefs, wrong scale.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl1xgylasx2ecto7rqdb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl1xgylasx2ecto7rqdb.png" alt="Rejects gallery" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not every shot works. These are the failure modes you design around by generating 30 candidates, not 3.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cost and the Point
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOTAL PIPELINE COST
────────────────────────────────────────
Audition:        $0.63   (25 Flux Dev)
Product prep:    $0.08   (4x background removal)
Production:      $4.80   (120 Kontext Multi)
────────────────────────────────────────
Total:           $5.51
Images:          120 generated → 4 delivered
Time:            9 minutes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This replaces real work done by real people. Photographers, stylists, location scouts, content managers, the influencers themselves. A week of content that used to involve a team and a budget now costs $5 and a laptop.&lt;/p&gt;

&lt;p&gt;The economics make it inevitable. When something costs $5 and takes 9 minutes, companies will do it. Many already are.&lt;/p&gt;

&lt;p&gt;These aren't portfolio-grade images. I'm a nerd, not an art director. But that's the point — if a solo dev with Claude Code and a fal.ai key can produce this in an afternoon, imagine what a professional creative team could do with the same tools.&lt;/p&gt;

&lt;p&gt;I'm not going to wrap this in a bow. The technology is here, it works, and it's only getting cheaper. What happens next is a policy question, not a technical one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus: They Move Now
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;B&amp;amp;O — ocean breeze, the cardigan moves, the dog turns to camera. $0.53 total.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz5i32ke2pojtltifesq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz5i32ke2pojtltifesq.gif" alt="B&amp;amp;O video winner" width="480" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just showing one here, and just as an animated GIF as that's all Dev.to allows, but four still images became four video clips for $1.96. The influencer breathes now.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>falai</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Visual UIs Are Now Possible in MCP Servers</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Mon, 02 Feb 2026 15:18:51 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/visual-uis-are-now-possible-in-mcp-servers-369a</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/visual-uis-are-now-possible-in-mcp-servers-369a</guid>
      <description>&lt;p&gt;MCP servers can now render interactive UIs directly in Claude Desktop's chat window. Not just text responses—actual HTML with JavaScript, maps, charts, anything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ce66f7bk54ns7k1whx1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ce66f7bk54ns7k1whx1.gif" alt="ISS Tracker demo" width="760" height="893"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/modelcontextprotocol/ext-apps" rel="noopener noreferrer"&gt;&lt;code&gt;@modelcontextprotocol/ext-apps&lt;/code&gt;&lt;/a&gt; library lets MCP tools return visual UIs. When you call a tool, instead of just getting text back, you get an interactive iframe rendered inline in the conversation.&lt;/p&gt;

&lt;p&gt;This means your AI assistant can show you things, not just tell you about them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/extensions/apps" rel="noopener noreferrer"&gt;Official MCP Apps docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.github.io/ext-apps/" rel="noopener noreferrer"&gt;API reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/modelcontextprotocol/ext-apps/tree/main/examples" rel="noopener noreferrer"&gt;Example implementations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The architecture has two parts: a server that fetches data and declares the UI, and a client-side app that renders it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server Side
&lt;/h3&gt;

&lt;p&gt;Register a tool with UI metadata pointing to an HTML resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;registerAppTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;registerAppResource&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/ext-apps/server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resourceUri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ui://iss-tracker/mcp-app.html&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Register the UI resource (bundled HTML)&lt;/span&gt;
&lt;span class="nf"&gt;registerAppResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resourceUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/html&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;APP_HTML&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Register the tool with UI metadata&lt;/span&gt;
&lt;span class="nf"&gt;registerAppTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;where_is_iss&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Show ISS location on a live map&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;uiResourceUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;resourceUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;csp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;connectDomains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://*.openstreetmap.org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://unpkg.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;resourceDomains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://*.openstreetmap.org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://unpkg.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;iss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;geo&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
      &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.wheretheiss.at/v1/satellites/25544&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
      &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://api.wheretheiss.at/v1/satellites/25544/positions?timestamps=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
      &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://ip-api.com/json/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;iss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;geo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;longitude&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;geo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lon&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;geo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;csp&lt;/code&gt; field is important—it declares which external domains your UI needs to access. Without this, Leaflet tiles and scripts would be blocked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Client Side
&lt;/h3&gt;

&lt;p&gt;The UI receives tool results and renders them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/ext-apps&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ISS Tracker&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ontoolresult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;structuredContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// Update your UI with the data&lt;/span&gt;
  &lt;span class="nf"&gt;updateMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;iss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Gotcha: Dynamic Script Loading
&lt;/h3&gt;

&lt;p&gt;Static &lt;code&gt;&amp;lt;script src=""&amp;gt;&lt;/code&gt; tags don't work in srcdoc iframes. You have to load external libraries dynamically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadLeaflet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;L&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;undefined&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Load CSS&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cssLink&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;link&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;cssLink&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;stylesheet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;cssLink&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://unpkg.com/leaflet@1.9.4/dist/leaflet.css&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;head&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cssLink&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Load JS&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://unpkg.com/leaflet@1.9.4/dist/leaflet.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onerror&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Failed to load Leaflet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;head&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This caught me off guard—took a while to figure out why Leaflet wasn't loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Clone: &lt;code&gt;git clone https://github.com/JasonMakes801/iss-tracker-mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Build: &lt;code&gt;bun install &amp;amp;&amp;amp; bun run build&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add to Claude Desktop config (&lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"iss-tracker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/bun"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/path/to/iss-tracker/dist/index.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--stdio"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Restart Claude Desktop&lt;/li&gt;
&lt;li&gt;Ask: "Where is the ISS?"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Maps are just the start. Dashboards, charts, forms, data visualizations—anything you can build in HTML can now live inside your AI conversation.&lt;/p&gt;

&lt;p&gt;What would you build with this?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>typescript</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Did You Know CLIP Works as an AI Image Detector?</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Thu, 15 Jan 2026 11:08:25 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/did-you-know-clip-works-as-an-ai-image-detector-1e6i</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/did-you-know-clip-works-as-an-ai-image-detector-1e6i</guid>
      <description>&lt;p&gt;OpenAI's CLIP model was trained to match images with text descriptions. But here's something surprising: it also works remarkably well at detecting AI-generated images. No fine-tuning required—just extract embeddings and add a simple classifier.&lt;/p&gt;

&lt;p&gt;I built one, with some help from Claude Code, to see how well this actually works. Here's what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dataset
&lt;/h2&gt;

&lt;p&gt;I collected 1,050 portrait-style images:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;525 AI images&lt;/strong&gt; from CivitAI (various Stable Diffusion models)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;525 real photos&lt;/strong&gt; from Unsplash&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both sets were curated to look similar—street photography, portraits, natural lighting. The goal was to make this hard, not easy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;AI-generated portraits from CivitAI&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva0xuyekax07z6s41l87.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva0xuyekax07z6s41l87.jpg" alt="AI Images" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Real photos from Unsplash&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvojbudnedwcvcia4i1ag.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvojbudnedwcvcia4i1ag.jpg" alt="Real Images" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Can you tell which is which? Deep into curating the AI images, I'd occasionally think "wow, that looks real." But the moment I switched to Unsplash, I realized none of them actually did. Real photos, to my eye and for now anyway, have a texture, a messiness that resets your expectations entirely.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Traditional Approach: FFT Analysis
&lt;/h2&gt;

&lt;p&gt;Before trying CLIP, I tested a traditional forensics technique: analyzing the frequency spectrum.&lt;/p&gt;

&lt;p&gt;The intuition is simple: real cameras introduce high-frequency sensor noise. AI generators don't simulate this noise, so AI images should have less energy in the high frequencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_high_freq_energy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;L&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;fft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fft2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fft_shifted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fftshift&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;power&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fft_shifted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="c1"&gt;# Measure energy in outer ring (high frequencies)
&lt;/span&gt;    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;high_freq_energy&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_energy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result: 50.4% accuracy.&lt;/strong&gt; Basically random.&lt;/p&gt;

&lt;p&gt;The problem? JPEG compression destroys high-frequency information anyway. On compressed web images, this technique is useless.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CLIP Approach
&lt;/h2&gt;

&lt;p&gt;CLIP (Contrastive Language-Image Pre-training) was trained on 400 million image-text pairs. It learned rich visual features that transfer surprisingly well to other tasks—including AI detection.&lt;/p&gt;

&lt;p&gt;The approach is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CLIPModel&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CLIPModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/clip-vit-base-patch32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/clip-vit-base-patch32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RGB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_image_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Normalize to unit vector
&lt;/span&gt;    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each image becomes a 512-dimensional vector. Then train a simple logistic regression:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result: 88.5% accuracy on held-out test images.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a massive jump from 50% (FFT) to 88.5% (CLIP + LogReg).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Does This Work?
&lt;/h2&gt;

&lt;p&gt;CLIP (&lt;a href="https://arxiv.org/abs/2103.00020" rel="noopener noreferrer"&gt;Radford et al., 2021&lt;/a&gt;) learned rich visual features from 400 million image-text pairs. These features transfer well to tasks it was never trained for.&lt;/p&gt;

&lt;p&gt;I used the smallest CLIP variant (ViT-B/32, ~150M parameters). Larger models like ViT-L/14 would likely do even better, but the small one already works surprisingly well.&lt;/p&gt;

&lt;p&gt;When we project the 512-dimensional embeddings down to 2D using UMAP, we can see the separation:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;UMAP projection of CLIP embeddings. Real images (blue) and AI images (red) cluster separately.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sp3kjg9s71pyxccdxh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sp3kjg9s71pyxccdxh1.png" alt="UMAP Visualization" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The two classes naturally separate in embedding space. The logistic regression just draws a line between them.&lt;/p&gt;
&lt;h2&gt;
  
  
  You Don't Even Need a Classifier
&lt;/h2&gt;

&lt;p&gt;Here's the surprising part: you can detect AI images without training anything.&lt;/p&gt;

&lt;p&gt;Just compute the centroid (mean) of each class and classify by nearest neighbor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Compute class centroids
&lt;/span&gt;&lt;span class="n"&gt;ai_centroid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;real_centroid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Classify a new image
&lt;/span&gt;&lt;span class="n"&gt;dist_to_ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ai_centroid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dist_to_real&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;real_centroid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dist_to_ai&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;dist_to_real&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Real&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result: 74.8% accuracy with zero training.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The logistic regression adds ~14% improvement, but CLIP embeddings alone get you most of the way there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does the Model See?
&lt;/h2&gt;

&lt;p&gt;Here's the honest answer: I don't know.&lt;/p&gt;

&lt;p&gt;I tried probing the CLIP dimensions to understand what features matter. The results were messy and inconclusive. These are learned representations, not human-interpretable features.&lt;/p&gt;

&lt;p&gt;Looking at the AI images ranked by confidence, there's no obvious pattern:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;AI images ranked from "fooled the detector" (top-left) to "obviously AI" (bottom-right). The visual pattern isn't clear—the model detects something we can't see.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12b7purasntlpbhy7g30.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12b7purasntlpbhy7g30.png" alt="Ranked AI Images" width="800" height="1263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The images at 12% confidence (fooled the detector) does look like a real photo at a glance, but does the 98% confidence image of the woman sitting at dusk in a sidewalk cafe really scream AI? CLIP is detecting subtle statistical signatures that aren't visible to human eyes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;This is an exploration of the technique, not a production AI detector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It won't generalize well.&lt;/strong&gt; I trained on portrait photography. It won't work reliably on landscapes, illustrations, or other styles. A real detector would need a much more diverse training set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI generators are improving.&lt;/strong&gt; The patterns CLIP detects today may disappear as generators get better at mimicking real image statistics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model isn't interpretable.&lt;/strong&gt; We can measure that it works, but we can't explain &lt;em&gt;why&lt;/em&gt; it works. That makes it hard to trust for high-stakes decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;CLIP embeddings are surprisingly effective for AI image detection:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FFT (traditional)&lt;/td&gt;
&lt;td&gt;50.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Centroid distance (no training)&lt;/td&gt;
&lt;td&gt;74.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logistic Regression on CLIP&lt;/td&gt;
&lt;td&gt;88.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: CLIP has learned features that capture something fundamental about how real and AI images differ—even though we can't see or explain what that something is.&lt;/p&gt;

&lt;p&gt;For a quick-and-dirty AI detector on a specific image domain, this approach works remarkably well. Just don't expect it to generalize to everything.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>ai</category>
      <category>computervision</category>
    </item>
    <item>
      <title>From Prototype to Production: Building a Multimodal Video Search Engine</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Tue, 06 Jan 2026 10:46:21 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/from-prototype-to-production-building-a-multimodal-video-search-engine-o1g</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/from-prototype-to-production-building-a-multimodal-video-search-engine-o1g</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/jason_peterson_607e54abf5/i-stacked-3-small-ml-models-and-got-video-search-that-feels-like-magic-2i63"&gt;last post&lt;/a&gt;, I wrote about the unreasonable effectiveness of model stacking for media search—combining CLIP, Whisper, and ArcFace to find video content through visual descriptions, dialog, and faces. Over the holidays I expanded that afternoon hack into something more production-like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://fennec.jasongpeterson.com" rel="noopener noreferrer"&gt;fennec.jasongpeterson.com&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Starter code:&lt;/strong&gt; &lt;a href="https://github.com/JasonMakes801/fennec-search" rel="noopener noreferrer"&gt;github.com/JasonMakes801/fennec-search&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try This
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://fennec.jasongpeterson.com" rel="noopener noreferrer"&gt;fennec.jasongpeterson.com&lt;/a&gt; (desktop browser)&lt;/li&gt;
&lt;li&gt;Enter &lt;code&gt;older man on phone, harbor background&lt;/code&gt; in Visual Content → click +&lt;/li&gt;
&lt;li&gt;Click the face of the older guy with glasses sitting with the harbor at his back&lt;/li&gt;
&lt;li&gt;Enter &lt;code&gt;the Americans had launched their missiles&lt;/code&gt; in Dialog (Semantic mode) → click +&lt;/li&gt;
&lt;li&gt;Play the clip&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You've drilled down to an exact shot without metadata, timecodes, or remembering exact words. The semantic search is fuzzy—he actually says "What it was telling him was that the US had launched their ICBMs," but that's close enough.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxticbq7ydkmisiip39d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxticbq7ydkmisiip39d.png" alt="Search result showing the scene" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Under the Hood
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Containerized architecture&lt;/strong&gt;: Vue/Nginx frontend, FastAPI backend, standalone ingest worker, Postgres+pgvector—all via docker-compose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background enrichment&lt;/strong&gt;: Polling-based worker that handles drive mounting/unmounting gracefully (Watchdog doesn't work reliably with NFS/network shares)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic dialog search&lt;/strong&gt;: Sentence-transformer embeddings so "Americans launched missiles" finds "US fired rockets"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frame-accurate playback&lt;/strong&gt;: HTML5 video decode to canvas using &lt;code&gt;requestVideoFrameCallback()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EDL export&lt;/strong&gt;: Queue scenes and export CMX 3600 for NLE roundtrip&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Postgres + pgvector setup turned out cleaner than expected—vector similarity combined with metadata filtering in a single query just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Demo&lt;/strong&gt;: &lt;a href="https://fennec.jasongpeterson.com" rel="noopener noreferrer"&gt;fennec.jasongpeterson.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code&lt;/strong&gt;: &lt;a href="https://github.com/JasonMakes801/fennec-search" rel="noopener noreferrer"&gt;github.com/JasonMakes801/fennec-search&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Demo footage from &lt;a href="https://en.wikipedia.org/wiki/Pioneer_One" rel="noopener noreferrer"&gt;Pioneer One&lt;/a&gt;, a Creative Commons-licensed Canadian drama. Built with significant help from &lt;a href="https://claude.com/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>python</category>
      <category>docker</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Stacked 3 Small ML Models and Got Video Search That Feels Like Magic</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Thu, 18 Dec 2025 03:13:56 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/i-stacked-3-small-ml-models-and-got-video-search-that-feels-like-magic-2i63</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/i-stacked-3-small-ml-models-and-got-video-search-that-feels-like-magic-2i63</guid>
      <description>&lt;p&gt;&lt;a href="https://youtu.be/aD2gBAPxjak" rel="noopener noreferrer"&gt;Video Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I spent a day building a video search prototype and came away genuinely surprised. Not by any single model — they're all "pretty good" on their own — but by what happens when you stack them together.&lt;/p&gt;

&lt;p&gt;The constraints compound. A so-so visual match plus a so-so transcript hit often surfaces &lt;em&gt;exactly&lt;/em&gt; the right shot. It's unreasonably effective.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I wanted to see how far open-source models could take intelligent video search. The kind of thing where you type "outdoor scene with two people talking about robots" and get useful results.&lt;/p&gt;

&lt;p&gt;Test footage: "Tears of Steel" — a 12-minute CC-BY short film from Blender Foundation. VFX, dialog, multiple characters. Good variety.&lt;/p&gt;

&lt;p&gt;The goal: stack filters in real-time. Visual content → face → dialog → timecode. See how precisely you can drill down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shot Segmentation
&lt;/h2&gt;

&lt;p&gt;First step: break the video into shots. We're not embedding every frame — that would be wasteful and slow. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Detect shot boundaries&lt;/strong&gt; using PySceneDetect's ContentDetector (analyzes frame-to-frame differences to find cuts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract a representative thumbnail&lt;/strong&gt; for each shot — just the center frame in this prototype&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run models on that single thumbnail&lt;/strong&gt; per shot&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For "Tears of Steel," this gave ~120 shots from 12 minutes. Each shot gets one CLIP embedding, one face detection pass, and the transcript segments that overlap its timecode range.&lt;/p&gt;

&lt;p&gt;This keeps compute reasonable and mirrors how editors actually think — in shots, not frames.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Models
&lt;/h2&gt;

&lt;p&gt;All open-source, all running locally on a MacBook Air (M3). No cloud inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLIP (ViT-B-32)&lt;/strong&gt; — The Swiss Army knife. Embed images and text into the same vector space, then compare. "Street scene" finds street scenes. "Green tones" finds green-graded shots. "Credits" finds credits. One model, endless queries.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enrichment: ~30ms per shot (single thumbnail)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Whisper (base)&lt;/strong&gt; — Speech to timestamped transcript. Runs on the full audio track, then segments are linked to shots by timestamp overlap.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enrichment: ~45s for the full 12-minute video&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ArcFace (buffalo_l via InsightFace)&lt;/strong&gt; — Face detection and embedding on the representative thumbnail. Click a face, find all other shots with that person. No identification needed, just clustering by visual similarity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enrichment: ~100ms per shot (single thumbnail)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. Three models. The magic is in how they combine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Stacking Works So Well
&lt;/h2&gt;

&lt;p&gt;Each filter alone returns "roughly right" results. But stack two or three and the precision jumps dramatically.&lt;/p&gt;

&lt;p&gt;Example workflow from the demo:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search "street scene, couple" → ~15 shots&lt;/li&gt;
&lt;li&gt;Click "Match" on an interesting frame → visually similar shots&lt;/li&gt;
&lt;li&gt;Click a face → only shots with that character&lt;/li&gt;
&lt;li&gt;Add "sorry" to dialog search → 2 results, both exactly right&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step cuts the noise. By the end, you're looking at exactly what you wanted.&lt;/p&gt;

&lt;p&gt;The same principle works for color. CLIP understands "green tones" or "warm sunset" without any separate color extraction. Add a face filter on top and you get "shots of this character in warm lighting." No custom code for that combination — it just falls out of the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Deliberately simple. ~500 lines of Python.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmvws35iwlguahybzry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmvws35iwlguahybzry.png" alt="Enrichment and Search Diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;PostgreSQL + pgvector stores everything in one place.&lt;/strong&gt; Embeddings, transcripts, face clusters, timestamps — all in the same database.&lt;/p&gt;

&lt;p&gt;This means a single SQL query can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter on metadata (timecode range, face cluster)&lt;/li&gt;
&lt;li&gt;Rank by vector similarity (CLIP embedding distance)&lt;/li&gt;
&lt;li&gt;Full-text search on transcripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No need to query multiple systems and merge results. One query, one round trip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;scene_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clip_embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;scenes&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;face_cluster_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'%'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;clip_embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's filtering by face, dialog, and timecode while ranking by visual similarity — in one query.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Claude Code Part
&lt;/h2&gt;

&lt;p&gt;I'll be honest: this prototype exists because of Claude Code with Opus 4.5.&lt;/p&gt;

&lt;p&gt;It wasn't a "write me a video search app" one-shot. It was a day-long collaboration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I described the architecture and models I wanted&lt;/li&gt;
&lt;li&gt;Claude scaffolded the project structure&lt;/li&gt;
&lt;li&gt;I'd test, hit issues, describe what was wrong&lt;/li&gt;
&lt;li&gt;Claude would debug, refactor, improve&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The iteration speed is what made this feasible. In one day I went from "I wonder if this would work" to a working demo I'm proud to show people. That used to be a week of wrestling with documentation and Stack Overflow.&lt;/p&gt;

&lt;p&gt;The code isn't perfect. There are rough edges. But the core insight — stacking small models — is validated. That's what a prototype is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This was 12 minutes of footage. Not a scale test. But the results are promising enough that I want to push further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More models&lt;/strong&gt;: Object detection (YOLO/SAM), OCR for on-screen text, audio classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Longer content&lt;/strong&gt;: Feature-length films, dailies from real productions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLP layer&lt;/strong&gt;: Parse "outdoor shots with the main character talking about technology" into structured filters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each additional model adds another dimension to filter on. The architecture supports it — just add another column and another filter clause.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're building search over any media type, don't sleep on model stacking. A 512-dim CLIP embedding, a transcript, and a face cluster ID — three simple signals — combine into search that feels intelligent.&lt;/p&gt;

&lt;p&gt;The models are all open-source. The infrastructure is Postgres with an extension. The frontend is vanilla HTML/JS. None of this is exotic.&lt;/p&gt;

&lt;p&gt;The magic is in the combination.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>postgres</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Pac-Man, Shakey the Robot, and Von Neumann Walk Into a Maze</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Sun, 30 Nov 2025 14:10:20 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/pac-man-shakey-the-robot-and-von-neumann-walk-into-a-maze-26f8</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/pac-man-shakey-the-robot-and-von-neumann-walk-into-a-maze-26f8</guid>
      <description>&lt;p&gt;Seeing Google's Doodle tribute to Pac-Man recently sent me spiraling down a nostalgia rabbit hole. It reminded me of the 80s and scrounging for quarters and the disappointment over how long I could make each last—and of a MOOC I took in what now feels like ancient history in AI terms. A Gemini search of my old Gmail archives confirms it was &lt;strong&gt;UC Berkeley CS 188: Introduction to Artificial Intelligence&lt;/strong&gt; via EdX, probably around 2012.&lt;/p&gt;

&lt;p&gt;The year 2012. The year Obama won re-election, the Mayan calendar "ended," and Honey Boo Boo was somehow a cultural phenomenon. Siri was barely a year old—a killer feature for iPhone sales despite rarely understanding what you asked. "Deep learning" was a phrase you'd find only in academic papers, not headlines. A different era entirely.&lt;/p&gt;

&lt;p&gt;That course required implementing classic algorithms like Greedy search and Minimax into Pac-Man. It was, honestly, a bit over my head at the time. But I slogged through it, and somewhere between debugging Python at 2 AM and watching my AI-controlled Pac-Man successfully evade ghosts, my mind was &lt;em&gt;blown&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;These "simple" algorithms—Minimax especially—produced results that felt... &lt;em&gt;intelligent&lt;/em&gt;. Surprising. Alive. Before that course, I'd thought of code as scripts: wholly deterministic sequences that did exactly what you told them. But watching Minimax navigate a maze, weighing possibilities, anticipating ghost movements—that was something else entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Revisit This Now?
&lt;/h2&gt;

&lt;p&gt;We're deep into the post-deep-learning, post-generative AI era. Claude Code, Antigravity, and OpenAI Codex write and debug code autonomously. Suno-generated songs are topping Billboard's country charts. Google's Genie 2 conjures playable 3D worlds from a single image. So why bother with these dusty old algorithms from the 1940s and 1980s?&lt;/p&gt;

&lt;p&gt;Because &lt;strong&gt;they're not obsolete&lt;/strong&gt;. Far from it.&lt;/p&gt;

&lt;p&gt;These algorithms are embedded everywhere: in your GPS routing, your game opponents, your thermostat, the trading bots on Wall Street. They run in parallel with the more advanced forms of AI now upon us—faster, smaller, and often more appropriate for the task at hand.&lt;/p&gt;

&lt;p&gt;And honestly? In Pac-Man—a game of &lt;em&gt;perfect information&lt;/em&gt;—we don't &lt;em&gt;need&lt;/em&gt; generative AI to move our yellow hero. An untuned LLM could probably play Pac-Man, sure, but with noticeable latency and probably not with better performance than good old Minimax.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Different Now
&lt;/h2&gt;

&lt;p&gt;Here's what &lt;em&gt;has&lt;/em&gt; changed: the barrier to experimentation has collapsed completely.&lt;/p&gt;

&lt;p&gt;Claude Opus 4.5 basically &lt;strong&gt;one-shotted&lt;/strong&gt; a modified Pac-Man clone that lets users experiment with different AI algorithms—from the hilariously dumb (Random, Greedy) to the surprisingly sophisticated (A*, Minimax). &lt;/p&gt;

&lt;p&gt;Seriously, we live in amazing times.&lt;/p&gt;

&lt;p&gt;Afterwards, I modified it using GitHub Copilot with more Opus 4.5 help—adding a benchmarking mode and various quality-of-life improvements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwxwdv6aptfgfd67y09x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwxwdv6aptfgfd67y09x.png" alt="Screenshot of the Pac-Man demo" width="800" height="727"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Pac-Man with various brains&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://jasonmakes801.github.io/PacManAI/" rel="noopener noreferrer"&gt;Try the demo here →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We'll explore each algorithm with animated GIFs as we go.&lt;/p&gt;


&lt;h2&gt;
  
  
  First, Let's Talk About the Ghosts
&lt;/h2&gt;

&lt;p&gt;Before we dive into Pac-Man's AI options, we need to understand his adversaries. Because here's the thing: &lt;strong&gt;the ghosts are living in the 1980s&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Blinky, Pinky, Inky, and Clyde don't use fancy pathfinding or machine learning. They use the exact same logic Toru Iwatani and Shigeo Funaki coded in 1980—simple targeting rules that, combined, create the illusion of intelligent pursuit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpkk9tr1kx19cs1o2k7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpkk9tr1kx19cs1o2k7l.png" alt="Pac-Man arcade cabinet at Hi-Score Gaming museum" width="800" height="1037"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo: &lt;a href="https://commons.wikimedia.org/wiki/File:Arcade-Automaten_im_Erlebnismuseum_Hi-Score_2023.jpg" rel="noopener noreferrer"&gt;Torben Friedrich&lt;/a&gt;, CC BY-SA 4.0&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Ghost Ensemble
&lt;/h3&gt;

&lt;p&gt;Each ghost has a distinct personality encoded in just a few lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Blinky (Red) - Direct chase&lt;/span&gt;
&lt;span class="nx"&gt;blinky&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;moveToward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Blinky (Red)&lt;/strong&gt; is pure Greedy algorithm—he targets Pac-Man's current position. Simple, relentless, predictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pinky (Pink)&lt;/strong&gt; is the ambusher—she targets 4 tiles &lt;em&gt;ahead&lt;/em&gt; of Pac-Man, trying to cut him off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pinky targets 4 tiles ahead of Pac-Man's direction&lt;/span&gt;
&lt;span class="nx"&gt;pinky&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanDir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;deltas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;pacmanDir&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;moveToward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Inky (Blue)&lt;/strong&gt; uses vector math—he calculates his target as a reflection of Blinky across Pac-Man's position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inky - Target is reflection of Blinky across Pac-Man&lt;/span&gt;
&lt;span class="nx"&gt;inky&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;blinkyPos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;blinkyPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;blinkyPos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;moveToward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Clyde (Orange)&lt;/strong&gt; is the coward—he chases when far away but retreats to his corner when he gets within 8 tiles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Clyde - Chase if far, scatter if close&lt;/span&gt;
&lt;span class="nx"&gt;clyde&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;manhattanDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;toGrid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;toGrid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;moveToward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Scatter to corner&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;moveToward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ghostPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;210&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Math Glitch That Became Canon
&lt;/h3&gt;

&lt;p&gt;Fun fact: there's a bug in Pinky's original code. When Pac-Man faces UP, an overflow error adds an extra offset to the LEFT, making her targeting slightly... &lt;em&gt;twitchy&lt;/em&gt;. Namco discovered this but kept it—it made Pinky feel more unpredictable, more &lt;em&gt;alive&lt;/em&gt;. Sometimes bugs are features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters for Our AI
&lt;/h3&gt;

&lt;p&gt;The ensemble of ghost behaviors is better than any individual ghost. They flank, they scatter, they reconverge. But they're also &lt;strong&gt;predictable in their unpredictability&lt;/strong&gt;—which means our Minimax algorithm, which assumes &lt;em&gt;optimal&lt;/em&gt; adversarial play, is actually playing against a weaker opponent than it thinks. More on that irony later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarking the Algorithms
&lt;/h2&gt;

&lt;p&gt;Let's see how our AI options actually perform. Run the benchmark mode yourself—cuz it's fun to watch 400 game starts play out in a minute or so! But here's what my results looked like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Survival&lt;/th&gt;
&lt;th&gt;Avg Score&lt;/th&gt;
&lt;th&gt;Ghosts Eaten&lt;/th&gt;
&lt;th&gt;Decision Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🥇&lt;/td&gt;
&lt;td&gt;Minimax&lt;/td&gt;
&lt;td&gt;31.9s&lt;/td&gt;
&lt;td&gt;1687&lt;/td&gt;
&lt;td&gt;5.5&lt;/td&gt;
&lt;td&gt;0.04ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥈&lt;/td&gt;
&lt;td&gt;A* Pathfinding&lt;/td&gt;
&lt;td&gt;17.5s&lt;/td&gt;
&lt;td&gt;1114&lt;/td&gt;
&lt;td&gt;3.4&lt;/td&gt;
&lt;td&gt;0.01ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉&lt;/td&gt;
&lt;td&gt;Random&lt;/td&gt;
&lt;td&gt;7.1s&lt;/td&gt;
&lt;td&gt;368&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.00ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Greedy&lt;/td&gt;
&lt;td&gt;1.0s&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.01ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Wait—&lt;em&gt;Random beats Greedy&lt;/em&gt;? And by a lot?&lt;/p&gt;

&lt;p&gt;Let's dig into each algorithm, from worst to best. Fair warning: John von Neumann is going to pop up a lot. So is the Cold War.&lt;/p&gt;




&lt;h2&gt;
  
  
  4th Place: Greedy — The Algorithm That Should Know Better
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8i82sl0sp7c9wgwm76vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8i82sl0sp7c9wgwm76vm.png" alt="Diagram showing Greedy algorithm targeting nearest pellet while ignoring nearby ghost" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It's no good being greedy!&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The History
&lt;/h3&gt;

&lt;p&gt;The term "Greedy algorithm" emerged from the optimization boom of the 1950s and 60s, solidified by work on Matroids by Jack Edmonds in the 1970s. The concept is beautifully simple: &lt;strong&gt;always take the locally optimal choice and hope it leads to a globally optimal solution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For some problems (like Dijkstra's shortest path), greedy works perfectly. For others (like Pac-Man survival), it's a disaster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Fails Spectacularly
&lt;/h3&gt;

&lt;p&gt;Our Greedy implementation does exactly one thing: find the nearest pellet and move toward it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;greedy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lastMove&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;nearest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;findNearestPellet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanGrid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// ...find the move that gets us closest to that pellet&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;manhattanDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;nearest&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's missing? &lt;strong&gt;Any awareness of ghosts whatsoever&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Greedy Pac-Man will cheerfully walk straight into Blinky's open maw if there's a pellet on the other side. It's the algorithmic equivalent of texting while crossing the street.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futfonrggbpdl2it2nexg.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futfonrggbpdl2it2nexg.gif" alt="Greedy algorithm walking straight into a ghost" width="480" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Greedy's fatal flaw: pellet tunnel vision&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average survival: ~1 second.&lt;/strong&gt; Ouch.&lt;/p&gt;


&lt;h2&gt;
  
  
  3rd Place: Random — The Drunken Master
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11rchgm0irg0s28jv6yo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11rchgm0irg0s28jv6yo.png" alt="Diagram showing how random algo ignores rewards and threats" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Every direction is equally valid. That's the whole strategy.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Ancient History
&lt;/h3&gt;

&lt;p&gt;Random number generation is older than computing itself. Dice from 2400 BC. I Ching stalks from 1100 BC China. Heated tortoise shells whose cracks were interpreted as divine messages.&lt;/p&gt;

&lt;p&gt;For millennia, generating randomness required physical hardware: dice, coins, shuffled cards. But computers are deterministic machines—how do you generate chaos from clockwork?&lt;/p&gt;
&lt;h3&gt;
  
  
  Enter Von Neumann (First Appearance!)
&lt;/h3&gt;

&lt;p&gt;During the Manhattan Project, Stanislaw Ulam invented the Monte Carlo method while playing solitaire during recovery from brain surgery. (The best algorithms are born from boredom.) He and von Neumann needed random numbers—lots of them—to simulate neutron diffusion in fissile material.&lt;/p&gt;

&lt;p&gt;Von Neumann's solution was the &lt;strong&gt;Middle-Square Method&lt;/strong&gt; (c. 1946):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a seed number&lt;/li&gt;
&lt;li&gt;Square it&lt;/li&gt;
&lt;li&gt;Extract the middle digits&lt;/li&gt;
&lt;li&gt;That's your random number (and your new seed)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Von Neumann famously acknowledged the philosophical absurdity: &lt;em&gt;"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But sinful or not, it worked well enough to help design the hydrogen bomb.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Random Beats Greedy
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;random&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lastMove&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;moves&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getValidMoves&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;bestMove&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;moves&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;bestScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;moves&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;move&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;moves&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;getOscillationPenalty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pacmanPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;move&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lastMove&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;bestScore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;bestScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nx"&gt;bestMove&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;move&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;bestMove&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Random has one crucial advantage over Greedy: &lt;strong&gt;it doesn't walk into the same trap twice&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By moving unpredictably, Random occasionally stumbles away from danger. It picks up pellets by accident. It sometimes, by pure chance, threads through a gap between ghosts that Greedy would have sprinted directly into.&lt;/p&gt;

&lt;p&gt;It's the Drunken Master of algorithms—surviving not through skill but through the sheer improbability of its movements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtcxne1la9jywjthi5pu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtcxne1la9jywjthi5pu.gif" alt="Random algorithm stumbling through the maze" width="480" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Drunken Master in action&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average survival: ~7 seconds.&lt;/strong&gt; Seven times better than Greedy!&lt;/p&gt;


&lt;h2&gt;
  
  
  2nd Place: A* — Shakey's Gift to Gaming
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v2y6bjhwjxrkkaz1z1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2v2y6bjhwjxrkkaz1z1j.png" alt="A diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Robot That Started It All
&lt;/h3&gt;

&lt;p&gt;In the late 1960s, at Stanford Research Institute, a robot named &lt;strong&gt;Shakey&lt;/strong&gt; wobbled through rooms full of blocks and ramps—the first mobile robot capable of reasoning about its actions. It needed to navigate without crashing, but Dijkstra's algorithm (1956) explored in all directions like spilling water. Shakey needed to head &lt;em&gt;toward&lt;/em&gt; its goal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kj3q0js6zhcb940wf7q.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kj3q0js6zhcb940wf7q.jpg" alt="Shakey the robot at SRI" width="600" height="954"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo: &lt;a href="https://commons.wikimedia.org/wiki/File:SRI_Shakey_with_callouts.jpg" rel="noopener noreferrer"&gt;SRI International&lt;/a&gt;, CC BY-SA 3.0&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Peter Hart, Nils Nilsson, and Bertram Raphael solved this by adding a &lt;strong&gt;heuristic&lt;/strong&gt;—an estimate of remaining distance: &lt;em&gt;f(n) = g(n) + h(n)&lt;/em&gt;, balancing known cost with estimated future cost.&lt;/p&gt;
&lt;h3&gt;
  
  
  Smart A* in Pac-Man
&lt;/h3&gt;

&lt;p&gt;Our A* implementation isn't just vanilla pathfinding—it has strategic goal selection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Determine goal priority:&lt;/span&gt;
&lt;span class="c1"&gt;// 1. If ghosts are edible -&amp;gt; hunt nearest ghost&lt;/span&gt;
&lt;span class="c1"&gt;// 2. If dangerous ghost is close -&amp;gt; go for power pellet&lt;/span&gt;
&lt;span class="c1"&gt;// 3. Otherwise -&amp;gt; collect regular pellets&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;edibleGhosts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Hunt nearest edible ghost&lt;/span&gt;
    &lt;span class="nx"&gt;goalGrid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;nearestGhost&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;closestDangerDist&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Ghost is close! Get a power pellet&lt;/span&gt;
    &lt;span class="nx"&gt;goalGrid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;findNearestPowerPellet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanGrid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Safe - collect regular pellets&lt;/span&gt;
    &lt;span class="nx"&gt;goalGrid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;findNearestRegularPellet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pacmanGrid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also incorporates danger costs—paths near ghosts have higher penalties. A* doesn't just find the shortest path—it finds the &lt;em&gt;safest&lt;/em&gt; short path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbc1kjt7w6nujbq8wxrxi.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbc1kjt7w6nujbq8wxrxi.gif" alt="A* algorithm efficiently collecting pellets while avoiding ghosts" width="480" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Methodical, efficient, but not paranoid enough&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average survival: ~17.5 seconds.&lt;/strong&gt; Solid, reliable performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  1st Place: Minimax — The Cold War Logic of a Paranoid Yellow Circle
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanloj7e6w2qy2fn6a1t7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanloj7e6w2qy2fn6a1t7.png" alt="A* algorithm efficiently collecting pellets while avoiding ghosts" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Methodical, efficient, but not paranoid enough&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Poker, Paranoia, and the Bomb
&lt;/h3&gt;

&lt;p&gt;To understand Minimax, you need to understand John von Neumann.&lt;/p&gt;

&lt;p&gt;Born in Budapest in 1903, von Neumann was a prodigy among prodigies—part of the legendary "Hungarian Martians" who reshaped American science. At Princeton's Institute for Advanced Study, he became fascinated with a question that standard probability couldn't answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you optimize against an opponent who's actively trying to beat you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Roulette wheels don't care if you lose. But poker players &lt;em&gt;bluff&lt;/em&gt;. They hide information. They maximize their gain at your expense.&lt;/p&gt;

&lt;p&gt;In 1928, von Neumann published "Theory of Parlor Games," proving the &lt;strong&gt;Minimax theorem&lt;/strong&gt;: in a zero-sum game with perfect information, there exists a strategy that minimizes your maximum possible loss.&lt;/p&gt;

&lt;p&gt;In other words: &lt;strong&gt;assume your opponent is a genius. Assume they'll always make the move that hurts you most. Then pick the move that leaves you least hurt in that worst case.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From Poker to Armageddon
&lt;/h3&gt;

&lt;p&gt;Von Neumann wasn't just an academic—he worked on the Manhattan Project and later served on the Atomic Energy Commission. His Minimax philosophy permeated Cold War strategy, underpinning &lt;strong&gt;Mutually Assured Destruction (MAD)&lt;/strong&gt;: create a situation where the opponent's "maximum loss" is so unacceptable they're forced into strategies that minimize it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Minimax in Pac-Man
&lt;/h3&gt;

&lt;p&gt;Our implementation builds a game tree to a depth of 6, alternating between MAX layers (Pac-Man maximizes) and MIN layers (ghosts minimize). The evaluation function weighs ghost proximity (heavily penalized), edible ghost hunting (heavily rewarded), and power pellet value (boosted when threatened). Alpha-beta pruning keeps it fast enough for real-time play.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Irony of Perfection
&lt;/h3&gt;

&lt;p&gt;Here's the beautiful irony: Minimax assumes the ghosts are &lt;em&gt;perfect adversaries&lt;/em&gt;. It prepares for opponents who will always make the optimal move against Pac-Man.&lt;/p&gt;

&lt;p&gt;But the ghosts aren't perfect. They're not even good. They're hard-wired, animatronic patterns from 1980—Blinky's relentless chase, Pinky's buggy ambush, Inky's confusing flanks, Clyde's cowardly retreats.&lt;/p&gt;

&lt;p&gt;Minimax is playing 4D chess against opponents who are playing checkers.&lt;/p&gt;

&lt;p&gt;The result? Minimax often seems &lt;em&gt;overly cautious&lt;/em&gt;—dodging threats that aren't really threats, optimizing against a genius adversary that doesn't exist. It's the Cold War logic of the bomb applied to a yellow circle eating dots: paranoid, calculating, and ultimately... over-prepared for an apocalypse that was never coming.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydqaj490wgn3hp3f04a.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydqaj490wgn3hp3f04a.gif" alt="Minimax algorithm playing with Cold War paranoia" width="480" height="428"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Paranoid, calculating, victorious—assuming an optimal opponent!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Average survival: ~32 seconds.&lt;/strong&gt; More than double A*'s performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Genealogy of Code
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Origin Story&lt;/th&gt;
&lt;th&gt;Core Philosophy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Greedy&lt;/td&gt;
&lt;td&gt;1950s-60s&lt;/td&gt;
&lt;td&gt;Optimization research, Dijkstra, Kruskal, Prim&lt;/td&gt;
&lt;td&gt;"Take the best immediate option"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random&lt;/td&gt;
&lt;td&gt;Ancient / 1946&lt;/td&gt;
&lt;td&gt;Dice, I Ching → Monte Carlo, von Neumann's Middle-Square&lt;/td&gt;
&lt;td&gt;"When in doubt, roll the dice"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A*&lt;/td&gt;
&lt;td&gt;1968&lt;/td&gt;
&lt;td&gt;Shakey the Robot at SRI&lt;/td&gt;
&lt;td&gt;"Balance known cost with estimated future"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimax&lt;/td&gt;
&lt;td&gt;1928 / Cold War&lt;/td&gt;
&lt;td&gt;Von Neumann's poker games → nuclear strategy&lt;/td&gt;
&lt;td&gt;"Assume the worst, prepare accordingly"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These algorithms aren't just code—they're artifacts of human history. Von Neumann's paranoia about Soviet intentions lives on in every game tree search. Ulam's boredom during convalescence echoes in every random number generator. Shakey's wobbling navigation through a room of blocks enabled every pathfinding algorithm in every video game since.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://jasonmakes801.github.io/PacManAI/" rel="noopener noreferrer"&gt;Play the demo →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run the benchmark. Watch Random stumble to third place. Watch Greedy die immediately. Watch A* navigate with mechanical efficiency. Watch Minimax dominate with Cold War paranoia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want to go further?&lt;/strong&gt; The code is begging for improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deepen the Minimax search beyond 6 plies—how much better can it get?&lt;/li&gt;
&lt;li&gt;Implement Expectimax, which models the ghosts as probabilistic rather than optimal adversaries&lt;/li&gt;
&lt;li&gt;Encode &lt;em&gt;actual&lt;/em&gt; ghost behaviors into the evaluation function—if you know Clyde retreats at 8 tiles, exploit it&lt;/li&gt;
&lt;li&gt;A hundred lines of well-tuned algorithm could absolutely wipe the floor with these 1980s ghosts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then think about this: I built this entire demo in an afternoon, mostly through conversation with an LLM. The barrier to experimenting with classic computer science has never been lower.&lt;/p&gt;

&lt;p&gt;We stand on the shoulders of giants—von Neumann, Ulam, Dijkstra, Hart, Nilsson, Raphael, Iwatani. Their algorithms, forged in the crucible of war and the whimsy of arcade entertainment, are now accessible to anyone with a browser.&lt;/p&gt;

&lt;p&gt;Use them. Learn from them. Build on them.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Final Note on Quarters
&lt;/h2&gt;

&lt;p&gt;Remember those quarters I mentioned scrounging for in the 80s? Here's where they went.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji50rpn1d4prft1ta41q.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji50rpn1d4prft1ta41q.jpg" alt="Toru Iwatani at GDC 2011" width="800" height="1241"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo: &lt;a href="https://www.flickr.com/photos/officialgdc/5493309930" rel="noopener noreferrer"&gt;Official GDC&lt;/a&gt;, CC BY 2.0&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Toru Iwatani, the designer who gave us Pac-Man, never received more than his regular Namco salary for creating one of the most successful games in history. No royalties, no bonus, no equity stake. Just a paycheck.&lt;/p&gt;

&lt;p&gt;And Namco's revenue model was gloriously simple: they sold arcade cabinets. Whole machines, turnkey systems. A Pac-Man cabinet cost around $2,500 in 1980 (about $9,500 today)—at a busy arcade, it could pay for itself in a month. The buyer paid Namco upfront, plugged the cabinet in, and kept every quarter that dropped. No subscriptions, no microtransactions, no licensing deals (those came later). Just hardware for cash.&lt;/p&gt;

&lt;p&gt;A simpler era—in algorithms &lt;em&gt;and&lt;/em&gt; in business.&lt;/p&gt;




&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Original Pac-Man game design:&lt;/strong&gt; Toru Iwatani, Namco (1980)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ghost AI logic:&lt;/strong&gt; Shigeo Funaki, based on Iwatani's personality concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base JavaScript Pac-Man engine:&lt;/strong&gt; Adapted from &lt;a href="https://github.com/daleharvey/pacman" rel="noopener noreferrer"&gt;daleharvey/pacman&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI implementations and modifications:&lt;/strong&gt; Built with Claude Opus 4.5 and GitHub Copilot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical research:&lt;/strong&gt; Google Deep Research, compiled from various sources on von Neumann, the Manhattan Project, SRI's Shakey project, and Namco's development history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diagrams:&lt;/strong&gt; Generated with Gemini Nano Banana Pro&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>javascript</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>Building a Rubik's Cube Solver: A Primer on Claude Skills</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Fri, 14 Nov 2025 15:19:20 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/building-a-rubiks-cube-solver-a-primer-on-claude-skills-4m7k</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/building-a-rubiks-cube-solver-a-primer-on-claude-skills-4m7k</guid>
      <description>&lt;p&gt;Got a scrambled Rubik's Cube gathering dust on your desk? I did too, which prompted me to build a Claude Skill that solves it from six photos. Take pictures of each face, and Claude analyzes them, validates the cube state, and returns step-by-step solving instructions.&lt;/p&gt;

&lt;p&gt;This project became a perfect way to understand how Claude Skills work—and discover some surprising aspects of building AI-powered workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Claude Skills?
&lt;/h2&gt;

&lt;p&gt;Claude Skills are procedural workflows that extend Claude's capabilities by combining its natural strengths (vision, reasoning, conversation) with external tools like Python scripts, APIs, or command-line utilities.&lt;/p&gt;

&lt;p&gt;The heart of a Claude Skill is a &lt;code&gt;SKILL.md&lt;/code&gt; file—a markdown document containing step-by-step instructions that Claude follows. Think of it as a playbook that tells Claude how to orchestrate a complex task from start to finish.&lt;/p&gt;

&lt;p&gt;Why does this matter? Skills are repeatable, shareable, and specialized. You can build a workflow once, package it with supporting scripts, and anyone can use it conversationally through Claude. No API wrangling, no UI to build—just describe the procedure, and Claude handles the orchestration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of a Claude Skill (Using the Rubik's Solver)
&lt;/h2&gt;

&lt;p&gt;Let's walk through how the Rubik's Cube solver works to see the components in action:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The SKILL.md file&lt;/strong&gt; contains procedural instructions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;### Step 1: Request Photos&lt;/span&gt;
Request exactly 6 photos from the user, one of each cube face
IN THIS SPECIFIC ORDER with correct orientation.

&lt;span class="gs"&gt;**Photo 1: White face (center) - Hold cube with BLUE on top**&lt;/span&gt;
&lt;span class="gs"&gt;**Photo 2: Orange face (center) - Hold cube with WHITE on top**&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python scripts&lt;/strong&gt; handle the computational heavy lifting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;solve_cube.py&lt;/code&gt; validates color sequences, renders visualizations, and runs the Kociemba solving algorithm&lt;/li&gt;
&lt;li&gt;Each face gets cached after validation&lt;/li&gt;
&lt;li&gt;The solver concatenates all faces and computes the optimal solution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Claude's orchestration&lt;/strong&gt; ties it all together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes photos using vision capabilities to extract the 9-sticker color sequence for each face&lt;/li&gt;
&lt;li&gt;Calls validation scripts with the detected colors&lt;/li&gt;
&lt;li&gt;Renders an emoji-based cube visualization for user confirmation&lt;/li&gt;
&lt;li&gt;Handles errors, corrections, and edge cases conversationally&lt;/li&gt;
&lt;li&gt;Delivers the final step-by-step solving instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request 6 photos (with specific orientation instructions)&lt;/li&gt;
&lt;li&gt;Claude analyzes each photo → extracts 9-sticker color sequence&lt;/li&gt;
&lt;li&gt;Python validates and caches each face&lt;/li&gt;
&lt;li&gt;Renders emoji visualization for user confirmation&lt;/li&gt;
&lt;li&gt;Concatenates → Solves → Returns instructions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsxius5wqin47v9242e5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffsxius5wqin47v9242e5.gif" alt="Claude uses the skill" width="600" height="331"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The skill in action: Claude guides the user through photo capture, validation, and solving&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Things That Might Surprise You as a Developer
&lt;/h2&gt;

&lt;p&gt;Building this skill revealed some interesting characteristics of the Claude Skills environment:&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependencies Are Casual
&lt;/h3&gt;

&lt;p&gt;You just mention &lt;code&gt;pip3 install kociemba&lt;/code&gt; in your SKILL.md file, and Claude installs it—every time you run the skill. There's no Docker image, no virtual environment, no persistent package state to manage. The configuration is completely stateless.&lt;/p&gt;

&lt;p&gt;This feels weird at first. Where's my &lt;code&gt;requirements.txt&lt;/code&gt;? What about version pinning? But it's also liberating: your skill is just instructions plus scripts. Claude handles the environment setup each time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Everything Runs Remotely
&lt;/h3&gt;

&lt;p&gt;Claude executes your scripts on a remote server, not your local machine. Your skill package is just a set of instructions and supporting files that Claude interprets and runs in its environment. You're not SSH-ing into a box or deploying to infrastructure—you're describing a workflow, and Claude makes it happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Iteration Takes Patience
&lt;/h3&gt;

&lt;p&gt;Here's the practical friction: Claude Code can't directly access image files uploaded in chat during skill development. My workaround was zipping the skill contents (SKILL.md, scripts, etc.) and uploading to Desktop for testing.&lt;/p&gt;

&lt;p&gt;Testing also means going through the full photo upload flow each iteration. It's more friction than local development with instant feedback, but manageable once you build a rhythm. Think of it as similar to testing a mobile app—you need to go through the actual user flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Vision Challenge
&lt;/h2&gt;

&lt;p&gt;One surprise: computer vision to color string conversion isn't 100% reliable, even with clear, well-lit photos. Claude sometimes misreads sticker colors. That's why the skill includes a validation step—rendering an emoji visualization for the user to confirm.&lt;/p&gt;

&lt;p&gt;This is still &lt;em&gt;dramatically&lt;/em&gt; easier than the old way: coaxing OpenCV into reliable color detection with threshold tuning, color space conversions, lighting normalization, and endless edge case handling. I'll take "human confirms the visualization" over "debug CV pipeline for three hours" any day.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills Are Non-Deterministic
&lt;/h3&gt;

&lt;p&gt;Here's what really impressed me: Skills aren't just rigid instruction-following. Claude doesn't execute your SKILL.md robotically.&lt;/p&gt;

&lt;p&gt;When things go wrong—a misoriented photo, unclear lighting, user uploads faces out of order—Claude adapts. It will valiantly work to reach a valid solver state, handling scenarios I didn't anticipate when designing the workflow. The conversational nature means graceful degradation, not hard failures.&lt;/p&gt;

&lt;p&gt;If a photo doesn't make sense, Claude asks clarifying questions. If the solver fails, it walks the user through corrections. This resilience comes "for free" from Claude's reasoning capabilities layered on top of your procedural instructions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyh49i04sv8izipyl44du.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyh49i04sv8izipyl44du.gif" alt="Solving the cube using Claude's instructions" width="600" height="338"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Following Claude's step-by-step instructions to solve the cube&lt;/em&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;h3&gt;
  
  
  Why Solutions Are Always ~20 Moves or Less
&lt;/h3&gt;

&lt;p&gt;The Kociemba algorithm uses a two-phase approach that's proven to solve any valid cube state in ~20 moves or fewer (often much less). Unlike beginner methods that solve layer-by-layer, or advanced methods like CFOP that optimize for speed, Kociemba finds mathematically near-optimal solutions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1&lt;/strong&gt;: Getting the cube into a specific subset of positions (G1 group)&lt;br&gt;
&lt;strong&gt;Phase 2&lt;/strong&gt;: Solving from that subset to completion&lt;/p&gt;

&lt;p&gt;This approach sacrifices execution speed (the moves aren't optimized for finger tricks) but guarantees remarkably short solutions. A beginner method might take 80-100 moves; CFOP averages 50-60. Kociemba typically delivers solutions in 18-22 moves, making it ideal for casual solving where you want minimal steps, not maximum speed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Try It Yourself &amp;amp; Help Me Improve
&lt;/h2&gt;

&lt;p&gt;The skill is &lt;a href="https://github.com/JasonMakes801/rubiks-cube-solver.git" rel="noopener noreferrer"&gt;open source on GitHub&lt;/a&gt;. It works with any standard 3×3 Rubik's Cube, and the only requirement is the kociemba library (which Claude auto-installs).&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Install:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clone the repository:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   git clone https://github.com/JasonMakes801/rubiks-cube-solver.git
   &lt;span class="nb"&gt;cd &lt;/span&gt;rubiks-cube-solver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a zip file of the skill:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   zip &lt;span class="nt"&gt;-r&lt;/span&gt; rubiks-cube-solver.zip SKILL.md scripts/ README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Install in Claude Desktop:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open Claude Desktop (or Claude Code)&lt;/li&gt;
&lt;li&gt;Upload the &lt;code&gt;rubiks-cube-solver.zip&lt;/code&gt; file to chat&lt;/li&gt;
&lt;li&gt;Ask Claude to "extract this skill and help me solve my Rubik's Cube"&lt;/li&gt;
&lt;li&gt;Claude will unpack the skill and begin the photo-guided workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have your scrambled Rubik's Cube ready&lt;/li&gt;
&lt;li&gt;Prepare to take 6 clear photos (one per face)&lt;/li&gt;
&lt;li&gt;Follow Claude's orientation instructions carefully&lt;/li&gt;
&lt;li&gt;Review the emoji visualization to confirm colors&lt;/li&gt;
&lt;li&gt;Follow the step-by-step solving instructions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  One Idea I'm Considering:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Can we eliminate the human-in-the-loop validation step?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Is there a way to build error correction into the scripts themselves—maybe cross-referencing impossible color combinations, using multiple validation passes, or applying constraint satisfaction logic—so the vision-to-string conversion becomes reliable enough to skip user confirmation?&lt;/p&gt;

&lt;p&gt;If you have ideas on this or other improvements, I'd love to hear them. Open an issue or PR, or just share your thoughts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Claude Skills unlock a new pattern: conversational workflows orchestrating specialized tools. This approach works far beyond Rubik's Cubes—data analysis pipelines, image processing workflows, API integrations, code generation tasks.&lt;/p&gt;

&lt;p&gt;The barrier to building custom AI workflows just got a lot lower. You don't need to build UIs, manage API keys, or handle state management. Just describe the procedure in markdown, provide the tools, and Claude handles the orchestration conversationally.&lt;/p&gt;

&lt;p&gt;What will you build?&lt;/p&gt;

</description>
      <category>llm</category>
      <category>tooling</category>
      <category>tutorial</category>
      <category>ai</category>
    </item>
    <item>
      <title>Testing AGENTS.md Across Three Agentic Coding Platforms: Universal Context Has Arrived</title>
      <dc:creator>Jason Peterson</dc:creator>
      <pubDate>Tue, 21 Oct 2025 03:38:48 +0000</pubDate>
      <link>https://forem.com/jason_peterson_607e54abf5/testing-agentsmd-across-three-agentic-coding-platforms-universal-context-has-arrived-1lg0</link>
      <guid>https://forem.com/jason_peterson_607e54abf5/testing-agentsmd-across-three-agentic-coding-platforms-universal-context-has-arrived-1lg0</guid>
      <description>&lt;p&gt;Most developers I know are loyal to their agentic coding platform. You're a Copilot person, a Claude Code person, etc. That made sense when each required its own special way of managing context.&lt;/p&gt;

&lt;p&gt;But AGENTS.md is quietly changing that equation. It's a universal context standard that works across GitHub Copilot, Claude Code, Gemini CLI, OpenAI Codex, and others. Write your project spec once, use any platform that fits the moment.&lt;/p&gt;

&lt;p&gt;I tested this by building the same application three different ways. Here's what I learned about the current state of agentic coding and why your workflow might benefit from a multi-platform approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AGENTS.md?
&lt;/h2&gt;

&lt;p&gt;AGENTS.md is a standardized markdown file that provides context to AI coding assistants. Think of it as a project brief that lives in your repository: requirements, technical specifications, coding preferences, architectural decisions, and context an AI needs to work effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What makes it useful:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Universal standard&lt;/strong&gt;: Works across GitHub Copilot, Claude Code, Gemini CLI, OpenAI Codex, and other AI coding tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plain markdown&lt;/strong&gt;: No special syntax required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent context&lt;/strong&gt;: The AI reads it each time, so you're not re-explaining your project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What goes in it:&lt;/strong&gt;&lt;br&gt;
Project overview, technical requirements, file structure, coding standards, dependencies, and setup instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it lives:&lt;/strong&gt;&lt;br&gt;
Place it in your project root directory. Some platforms support AGENTS.md files at multiple levels for more granular context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform-specific notes:&lt;/strong&gt;&lt;br&gt;
GitHub Copilot also supports Instructions.md at various levels, but AGENTS.md works universally. Claude Code and Gemini CLI both use AGENTS.md as their primary context source.&lt;/p&gt;

&lt;p&gt;The key advantage: write your context once, and multiple AI coding tools can use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;For this experiment, I needed a project complex enough to stress-test these tools: Conway's Game of Life with real-time pattern recognition (to make the coding challenge a bit harder) and a retro arcade aesthetic. The AGENTS.md specification was 2,000 words covering the cellular automaton logic, visual effects (CRT scanlines, glow), and automatic detection and color-coding of emergent patterns like gliders, oscillators, and still lifes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdw6ctpvwy9tm8rzbflw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdw6ctpvwy9tm8rzbflw.png" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same spec. Three platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Copilot with GPT-5 (my daily driver, typically with Claude Sonnet 4.5)&lt;/li&gt;
&lt;li&gt;Claude Code (Anthropic's command-line coding agent)&lt;/li&gt;
&lt;li&gt;Gemini CLI (Google's terminal-based coding tool)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran each from a clean slate, pointing them at the same AGENTS.md file. No hand-holding, no iterative fixes, just one shot to see what each would build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Happened
&lt;/h2&gt;

&lt;p&gt;All three tools produced working implementations. But the approaches, results, and developer experience differed in revealing ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code: The Planner
&lt;/h3&gt;

&lt;p&gt;Claude Code paused before writing code. It read the specification, presented a detailed roadmap of what it intended to build (file structure, implementation approach, feature priorities), then required my approval before proceeding.&lt;/p&gt;

&lt;p&gt;This felt collaborative. Less "AI does the thing" and more "AI proposes a plan, human signs off."&lt;/p&gt;

&lt;p&gt;The result? The most polished one-shot implementation. Pattern recognition worked correctly, visual effects were solid, code was well-structured. It felt production-ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini CLI: The Honest Craftsman
&lt;/h3&gt;

&lt;p&gt;Gemini got close. The implementation was visually true to the requested aesthetic. But it was upfront about not being finished: "Next, I will focus on enhancing the pattern detection to recognize more complex patterns like gliders and other oscillators, as specified in the project requirements."&lt;/p&gt;

&lt;p&gt;I appreciated the honesty. It delivered something genuinely good while acknowledging where it fell short of the spec. The transparency felt valuable.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot + GPT-5: The Capable Generalist
&lt;/h3&gt;

&lt;p&gt;Copilot produced a solid implementation quickly. The game worked, the retro aesthetic was there, the code was clean. But pattern recognition (specifically the color-coding of oscillators) didn't quite work as specified.&lt;/p&gt;

&lt;p&gt;Not broken, just incomplete on one of the core features. Still impressive, just not as polished as Claude Code's output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Objective Analysis
&lt;/h2&gt;

&lt;p&gt;I didn't want this to just be my opinion. So I had Grok Code Fast 1 conduct a blind code review of all three implementations.&lt;/p&gt;

&lt;p&gt;I gave Grok the AGENTS.md specification and all three complete codebases. No context about which tool built which. Just: evaluate these against the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code: 9/10&lt;/strong&gt;&lt;br&gt;
• Pattern Recognition: Excellent (gliders in all 4 orientations, multiple still lifes, oscillators)&lt;br&gt;
• Advanced Features: Afterglow trails, extinction alerts, stable pattern detection ✓&lt;br&gt;
• Visual Polish: Full retro arcade UI with CRT scanlines and legend ✓&lt;br&gt;
• Weaknesses: Missing LWSS spaceship detection; potential performance lag in dense grids&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0ddsse13culmvl685dm.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0ddsse13culmvl685dm.gif" alt=" " width="600" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot + GPT-5: 9/10&lt;/strong&gt;&lt;br&gt;
• Pattern Recognition: Strong (gliders in all 4 orientations, LWSS spaceship, oscillators, still lifes)&lt;br&gt;
• Advanced Features: Scanlines, vignette, vector-style glow, stability alerts ✓&lt;br&gt;
• Visual Polish: Balanced retro aesthetic with optional FPS display ✓&lt;br&gt;
• Weaknesses: Oscillator detection relies on state comparison, potentially missing edge cases&lt;/p&gt;

&lt;p&gt;(I'd give it an 8, that 9 is a bit generous to my mind.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssq3ewxdkklh9xmrhv8a.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssq3ewxdkklh9xmrhv8a.gif" alt=" " width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI: 6/10&lt;/strong&gt;&lt;br&gt;
• Pattern Recognition: Limited (only block still life and horizontal blinker)&lt;br&gt;
• Advanced Features: Basic trail effects for dead cells&lt;br&gt;
• Visual Polish: Clean, functional UI with retro styling ✓&lt;br&gt;
• Weaknesses: Severely limited pattern detection (misses gliders, spaceships, most oscillators); no stability/extinction detection&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vaunngzhhvkcdle4bm1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vaunngzhhvkcdle4bm1.gif" alt=" " width="720" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workflow Insight
&lt;/h2&gt;

&lt;p&gt;Beyond the scores, this experiment revealed something practical: &lt;strong&gt;using multiple AI coding tools on the same project is now genuinely viable&lt;/strong&gt; and maybe even optimal.&lt;/p&gt;

&lt;p&gt;Both Claude Code and Gemini CLI install via Homebrew on Mac (&lt;code&gt;brew install claude-code&lt;/code&gt; / &lt;code&gt;brew install gemini-cli&lt;/code&gt;), which makes experimentation trivially easy. Both also make your terminal look fantastic, which shouldn't matter but somehow does.&lt;/p&gt;

&lt;p&gt;The real insight: &lt;strong&gt;if you're already using Copilot in VSCode, you'd be missing an opportunity not to open a Terminal pane and occasionally run Claude Code or Gemini CLI for a second opinion.&lt;/strong&gt; Both tools will read your AGENTS.md file for context. You're not starting over. You're getting a different perspective on the same project.&lt;/p&gt;

&lt;p&gt;The AGENTS.md file makes this seamless. One specification, multiple tools that can execute against it, for those times when one agent gets stuck on a hard problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;We're at an interesting moment with AI-assisted development. These tools aren't experimental anymore. They're genuinely capable. Claude Code delivered something close to production-ready code in one shot. Copilot's implementation was solid and reliable. Even Gemini, despite its pattern recognition gaps, built something functional and visually appealing, and I'm sure given a second shot, it would nail the pattern recognition.&lt;/p&gt;

&lt;p&gt;The AGENTS.md standard makes it practical to use multiple tools without rewriting context each time. This isn't about abandoning your preferred assistant. It's about recognizing that different tools have different strengths. Claude Code's planning phase caught edge cases. Copilot's spaceship detection was more complete. Gemini's aesthetic choices were compelling even where its pattern detection fell short.&lt;/p&gt;

&lt;p&gt;You don't need to pick one. The infrastructure for multi-tool workflows already exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;All three implementations are available to explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://codepen.io/Jason-Peterson-the-bold/pen/azdqzYJ" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://codepen.io/Jason-Peterson-the-bold/pen/PwZQwBX" rel="noopener noreferrer"&gt;Github Copilot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://codepen.io/Jason-Peterson-the-bold/pen/raxJavM" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AGENTS.md file that powered all three is &lt;a href="https://gist.github.com/brandnewpeterson/7e1f603201fa3a81735794b4f514327a#file-agents-md" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're already using one AI coding assistant, consider experimenting with another. The barrier to entry is lower than you think, and the insights from seeing different approaches to the same problem are worth the fifteen minutes it takes to try.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>agents</category>
      <category>githubcopilot</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
