Forem: Jason Peterson

I Built a 120-Image AI Influencer Pipeline for $4.80

Jason Peterson — Fri, 13 Feb 2026 18:36:24 +0000

Erewhon Smoothie — Lilly and her Cavalier on a Brooklyn sidewalk. $0.04.

Rimowa Classic Cabin — airport terminal, golden hour. The suitcase sits beside her like it belongs. $0.04.

Hennessy VSOP — candlelit speakeasy, bottle on the table, dog on her lap. $0.04.

Bang & Olufsen Beoplay H95 — Montauk beach at golden hour. The hardest shot in the set: only 2 of 30 attempts got the headphones right. $0.04.

Four brand partnerships. 120 AI-generated photos. One fictional influencer, her dog, and a $4.80 fal.ai bill. This is the entire pipeline — what worked, what didn't, and why it matters.

From Tilly to Lilly

Last September, Eline van der Velden announced at the Zurich Summit that her AI-generated actress "Tilly Norwood" was in talks with a talent agency. Emily Blunt, Melissa Barrera, and Whoopi Goldberg publicly condemned it. Van der Velden received death threats. The backlash proved something important: synthetic people are now consistent and believable enough to genuinely threaten livelihoods. But Tilly is an actress in controlled contexts. What about influencers — where the content IS the product?

Meet Lilly Sorghum. Late 20s, Afro-Caribbean heritage, effortlessly stylish, never without her Cavalier King Charles Spaniel. She has brand partnerships with Erewhon, Rimowa, Hennessy, and Bang & Olufsen. She doesn't exist.

25 candidates, $0.63. Pick one, lock it, never regenerate her again.

I generated 25 candidates with Flux Dev ($0.63), picked one, and locked her as the reference anchor. Every shot that follows preserves her face, her dog, her vibe.

The goal isn't to deceive — it's to make the pipeline transparent so you can see exactly how trivial this has become.

How It Works

Fan out, fan in. 120 shots generated in parallel, 4 winners selected.

Claude Code is the orchestrator. I give it a creative brief and it spawns a swarm of sub-agents — one per product, each running in parallel, each spawning its own 30 shot agents. The lead agent writes the briefs, the product agents write the prompts, the shot agents call fal.ai, and the judge agents evaluate the results with vision. No Python threading, no job queue — just Claude Code talking to itself in parallel and writing the code to make the API calls.

Each shot agent's core is about 10 lines of Python that Claude wrote:

result = fal_client.subscribe("fal-ai/flux-pro/kontext/multi", arguments={
    "prompt": prompt,
    "image_urls": [ref_url, product_url],
    "output_format": "png",
    "aspect_ratio": "3:4",
})
url = result["images"][0]["url"]

Pass fal.ai's Kontext Pro Multi two reference images — Lilly+dog and the product photo — describe the scene, get back an integrated shot. $0.04 each. 9 minutes wall clock for all 120.

fal.ai made this project possible in a way that local inference couldn't. One API key, no GPU provisioning, no model downloads — and critically, their infrastructure handled 120 concurrent requests without breaking a sweat. The developer experience is remarkably clean: one fal_client.subscribe() call per image, results back in seconds. When you're building a parallelized pipeline, that simplicity compounds.

The star of the show is Kontext Multi's scene understanding. It doesn't just paste objects — it rotates a suitcase upright, places a bottle on a table, wraps headphones around a neck. All from flat product photos.

A year ago, character consistency was the hard problem. You'd fine-tune a LoRA, train DreamBooth for hours, and still get drift by image 20. Now it's one reference image passed as an API parameter. Lilly is recognizably herself in all 120 shots — same face, same dog, different scenes, different outfits. Consistency is table stakes. Product integration is the new wild card:

COHERENCE — All 3 Elements (Lilly + Dog + Product)
────────────────────────────────────────────────────
Rimowa (suitcase beside her):     19/30  63%
Hennessy (bottle on table):       16/30  53%
Erewhon (cup in hand):            11/30  37%
B&O (headphones around neck):      2/30   7%

The pattern: objects that sit beside the subject are easy. Held objects are harder. Wearables are nearly impossible — only 2 of 30 B&O shots got headphones right. The brief matters more than prompt engineering.

The solve: generate many, pick few. 30 candidates at $0.04 each gives you enough even at 7%. That 30:1 ratio is how real creative production works — AI just makes it $1.20 instead of $15,000.

The Results

Human picks vs. AI vision judge — 4/4 agreement on winners.

After generating 120 images, I picked winners two ways: by hand, and by having Claude evaluate every image with vision, scoring on 8 criteria — character consistency, product visibility, composition, scroll-stop factor, and more.

HUMAN vs. AI JUDGE — Winner Picks
──────────────────────────────────
Rimowa:     Human #4    AI #4    ✓
Erewhon:    Human #3    AI #3    ✓
Hennessy:   Human #20   AI #20   ✓
B&O:        Human #14   AI #14   ✓

4/4 agreement on winners. They diverged on runner-ups (2 out of 3 different) — the obvious best stands out, the second-best is subjective. This suggests the judging step could be fully automated. The entire pipeline — brief to finished post — could run unattended.

The failure modes: doubled products, extra dogs, ignored briefs, wrong scale.

Not every shot works. These are the failure modes you design around by generating 30 candidates, not 3.

The Cost and the Point

TOTAL PIPELINE COST
────────────────────────────────────────
Audition:        $0.63   (25 Flux Dev)
Product prep:    $0.08   (4x background removal)
Production:      $4.80   (120 Kontext Multi)
────────────────────────────────────────
Total:           $5.51
Images:          120 generated → 4 delivered
Time:            9 minutes

This replaces real work done by real people. Photographers, stylists, location scouts, content managers, the influencers themselves. A week of content that used to involve a team and a budget now costs $5 and a laptop.

The economics make it inevitable. When something costs $5 and takes 9 minutes, companies will do it. Many already are.

These aren't portfolio-grade images. I'm a nerd, not an art director. But that's the point — if a solo dev with Claude Code and a fal.ai key can produce this in an afternoon, imagine what a professional creative team could do with the same tools.

I'm not going to wrap this in a bow. The technology is here, it works, and it's only getting cheaper. What happens next is a policy question, not a technical one.

Bonus: They Move Now

B&O — ocean breeze, the cardigan moves, the dog turns to camera. $0.53 total.

Just showing one here, and just as an animated GIF as that's all Dev.to allows, but four still images became four video clips for $1.96. The influencer breathes now.

Visual UIs Are Now Possible in MCP Servers

Jason Peterson — Mon, 02 Feb 2026 15:18:51 +0000

MCP servers can now render interactive UIs directly in Claude Desktop's chat window. Not just text responses—actual HTML with JavaScript, maps, charts, anything.

What Changed

The @modelcontextprotocol/ext-apps library lets MCP tools return visual UIs. When you call a tool, instead of just getting text back, you get an interactive iframe rendered inline in the conversation.

This means your AI assistant can show you things, not just tell you about them.

Resources:

How It Works

The architecture has two parts: a server that fetches data and declares the UI, and a client-side app that renders it.

Server Side

import { registerAppTool, registerAppResource } from "@modelcontextprotocol/ext-apps/server";

const resourceUri = "ui://iss-tracker/mcp-app.html";

// Register the UI resource (bundled HTML)
registerAppResource(server, resourceUri, "text/html", () => APP_HTML);

// Register the tool with UI metadata
registerAppTool(server, "where_is_iss", {
  description: "Show ISS location on a live map",
  uiResourceUri: resourceUri,
  csp: {
    connectDomains: ["https://*.openstreetmap.org", "https://unpkg.com"],
    resourceDomains: ["https://*.openstreetmap.org", "https://unpkg.com"],
  },
  execute: async () => {
    const [iss, path, geo] = await Promise.all([
      fetch("https://api.wheretheiss.at/v1/satellites/25544").then(r => r.json()),
      fetch(`https://api.wheretheiss.at/v1/satellites/25544/positions?timestamps=${timestamps}`).then(r => r.json()),
      fetch("http://ip-api.com/json/").then(r => r.json()),
    ]);
    return { iss, path, user: { latitude: geo.lat, longitude: geo.lon, city: geo.city } };
  },
});

The csp field is important—it declares which external domains your UI needs to access. Without this, Leaflet tiles and scripts would be blocked.

Client Side

The UI receives tool results and renders them:

import { App } from "@modelcontextprotocol/ext-apps";

const app = new App({ name: "ISS Tracker", version: "1.0.0" });

app.ontoolresult = (result) => {
  const data = result.structuredContent;
  // Update your UI with the data
  updateMap(data.iss, data.user);
};

app.connect();

Key Gotcha: Dynamic Script Loading

Static <script src=""> tags don't work in srcdoc iframes. You have to load external libraries dynamically:

async function loadLeaflet(): Promise<void> {
  if (typeof L !== "undefined") return;

  // Load CSS
  const cssLink = document.createElement("link");
  cssLink.rel = "stylesheet";
  cssLink.href = "https://unpkg.com/leaflet@1.9.4/dist/leaflet.css";
  document.head.appendChild(cssLink);

  // Load JS
  return new Promise((resolve, reject) => {
    const script = document.createElement("script");
    script.src = "https://unpkg.com/leaflet@1.9.4/dist/leaflet.js";
    script.onload = () => resolve();
    script.onerror = () => reject(new Error("Failed to load Leaflet"));
    document.head.appendChild(script);
  });
}

This caught me off guard—took a while to figure out why Leaflet wasn't loading.

Try It Yourself

Clone: git clone https://github.com/JasonMakes801/iss-tracker-mcp
Build: bun install && bun run build
Add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "iss-tracker": {
      "command": "/path/to/bun",
      "args": ["/path/to/iss-tracker/dist/index.js", "--stdio"]
    }
  }
}

Restart Claude Desktop
Ask: "Where is the ISS?"

What's Next

Maps are just the start. Dashboards, charts, forms, data visualizations—anything you can build in HTML can now live inside your AI conversation.

What would you build with this?

Did You Know CLIP Works as an AI Image Detector?

Jason Peterson — Thu, 15 Jan 2026 11:08:25 +0000

OpenAI's CLIP model was trained to match images with text descriptions. But here's something surprising: it also works remarkably well at detecting AI-generated images. No fine-tuning required—just extract embeddings and add a simple classifier.

I built one, with some help from Claude Code, to see how well this actually works. Here's what I learned.

The Dataset

I collected 1,050 portrait-style images:

525 AI images from CivitAI (various Stable Diffusion models)
525 real photos from Unsplash

Both sets were curated to look similar—street photography, portraits, natural lighting. The goal was to make this hard, not easy.

AI-generated portraits from CivitAI

Real photos from Unsplash

Can you tell which is which? Deep into curating the AI images, I'd occasionally think "wow, that looks real." But the moment I switched to Unsplash, I realized none of them actually did. Real photos, to my eye and for now anyway, have a texture, a messiness that resets your expectations entirely.

The Traditional Approach: FFT Analysis

Before trying CLIP, I tested a traditional forensics technique: analyzing the frequency spectrum.

The intuition is simple: real cameras introduce high-frequency sensor noise. AI generators don't simulate this noise, so AI images should have less energy in the high frequencies.

def compute_high_freq_energy(image_path):
    img = Image.open(image_path).convert("L")
    img_array = np.array(img, dtype=np.float64)

    fft = np.fft.fft2(img_array)
    fft_shifted = np.fft.fftshift(fft)
    power = np.abs(fft_shifted) ** 2

    # Measure energy in outer ring (high frequencies)
    # ...
    return high_freq_energy / total_energy

Result: 50.4% accuracy. Basically random.

The problem? JPEG compression destroys high-frequency information anyway. On compressed web images, this technique is useless.

The CLIP Approach

CLIP (Contrastive Language-Image Pre-training) was trained on 400 million image-text pairs. It learned rich visual features that transfer surprisingly well to other tasks—including AI detection.

The approach is dead simple:

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

def get_embedding(image_path):
    image = Image.open(image_path).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        embedding = model.get_image_features(**inputs)

    # Normalize to unit vector
    embedding = embedding / embedding.norm()
    return embedding.numpy().flatten()

Each image becomes a 512-dimensional vector. Then train a simple logistic regression:

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

Result: 88.5% accuracy on held-out test images.

That's a massive jump from 50% (FFT) to 88.5% (CLIP + LogReg).

Why Does This Work?

CLIP (Radford et al., 2021) learned rich visual features from 400 million image-text pairs. These features transfer well to tasks it was never trained for.

I used the smallest CLIP variant (ViT-B/32, ~150M parameters). Larger models like ViT-L/14 would likely do even better, but the small one already works surprisingly well.

When we project the 512-dimensional embeddings down to 2D using UMAP, we can see the separation:

UMAP projection of CLIP embeddings. Real images (blue) and AI images (red) cluster separately.

The two classes naturally separate in embedding space. The logistic regression just draws a line between them.

You Don't Even Need a Classifier

Here's the surprising part: you can detect AI images without training anything.

Just compute the centroid (mean) of each class and classify by nearest neighbor:

# Compute class centroids
ai_centroid = X_train[y_train == 1].mean(axis=0)
real_centroid = X_train[y_train == 0].mean(axis=0)

# Classify a new image
dist_to_ai = np.linalg.norm(embedding - ai_centroid)
dist_to_real = np.linalg.norm(embedding - real_centroid)

prediction = "AI" if dist_to_ai < dist_to_real else "Real"

Result: 74.8% accuracy with zero training.

The logistic regression adds ~14% improvement, but CLIP embeddings alone get you most of the way there.

What Does the Model See?

Here's the honest answer: I don't know.

I tried probing the CLIP dimensions to understand what features matter. The results were messy and inconclusive. These are learned representations, not human-interpretable features.

Looking at the AI images ranked by confidence, there's no obvious pattern:

AI images ranked from "fooled the detector" (top-left) to "obviously AI" (bottom-right). The visual pattern isn't clear—the model detects something we can't see.

The images at 12% confidence (fooled the detector) does look like a real photo at a glance, but does the 98% confidence image of the woman sitting at dusk in a sidewalk cafe really scream AI? CLIP is detecting subtle statistical signatures that aren't visible to human eyes.

Limitations

This is an exploration of the technique, not a production AI detector.

It won't generalize well. I trained on portrait photography. It won't work reliably on landscapes, illustrations, or other styles. A real detector would need a much more diverse training set.

AI generators are improving. The patterns CLIP detects today may disappear as generators get better at mimicking real image statistics.

The model isn't interpretable. We can measure that it works, but we can't explain why it works. That makes it hard to trust for high-stakes decisions.

Conclusion

CLIP embeddings are surprisingly effective for AI image detection:

Approach	Accuracy
FFT (traditional)	50.4%
Centroid distance (no training)	74.8%
Logistic Regression on CLIP	88.5%

The key insight: CLIP has learned features that capture something fundamental about how real and AI images differ—even though we can't see or explain what that something is.

For a quick-and-dirty AI detector on a specific image domain, this approach works remarkably well. Just don't expect it to generalize to everything.

From Prototype to Production: Building a Multimodal Video Search Engine

Jason Peterson — Tue, 06 Jan 2026 10:46:21 +0000

In my last post, I wrote about the unreasonable effectiveness of model stacking for media search—combining CLIP, Whisper, and ArcFace to find video content through visual descriptions, dialog, and faces. Over the holidays I expanded that afternoon hack into something more production-like.

Live demo: fennec.jasongpeterson.com
Starter code: github.com/JasonMakes801/fennec-search

Try This

Go to fennec.jasongpeterson.com (desktop browser)
Enter older man on phone, harbor background in Visual Content → click +
Click the face of the older guy with glasses sitting with the harbor at his back
Enter the Americans had launched their missiles in Dialog (Semantic mode) → click +
Play the clip

You've drilled down to an exact shot without metadata, timecodes, or remembering exact words. The semantic search is fuzzy—he actually says "What it was telling him was that the US had launched their ICBMs," but that's close enough.

What's Under the Hood

Containerized architecture: Vue/Nginx frontend, FastAPI backend, standalone ingest worker, Postgres+pgvector—all via docker-compose
Background enrichment: Polling-based worker that handles drive mounting/unmounting gracefully (Watchdog doesn't work reliably with NFS/network shares)
Semantic dialog search: Sentence-transformer embeddings so "Americans launched missiles" finds "US fired rockets"
Frame-accurate playback: HTML5 video decode to canvas using requestVideoFrameCallback()
EDL export: Queue scenes and export CMX 3600 for NLE roundtrip

The Postgres + pgvector setup turned out cleaner than expected—vector similarity combined with metadata filtering in a single query just works.

I Stacked 3 Small ML Models and Got Video Search That Feels Like Magic

Jason Peterson — Thu, 18 Dec 2025 03:13:56 +0000

Video Demo

I spent a day building a video search prototype and came away genuinely surprised. Not by any single model — they're all "pretty good" on their own — but by what happens when you stack them together.

The constraints compound. A so-so visual match plus a so-so transcript hit often surfaces exactly the right shot. It's unreasonably effective.

The Setup

I wanted to see how far open-source models could take intelligent video search. The kind of thing where you type "outdoor scene with two people talking about robots" and get useful results.

Test footage: "Tears of Steel" — a 12-minute CC-BY short film from Blender Foundation. VFX, dialog, multiple characters. Good variety.

The goal: stack filters in real-time. Visual content → face → dialog → timecode. See how precisely you can drill down.

Shot Segmentation

First step: break the video into shots. We're not embedding every frame — that would be wasteful and slow. Instead:

Detect shot boundaries using PySceneDetect's ContentDetector (analyzes frame-to-frame differences to find cuts)
Extract a representative thumbnail for each shot — just the center frame in this prototype
Run models on that single thumbnail per shot

For "Tears of Steel," this gave ~120 shots from 12 minutes. Each shot gets one CLIP embedding, one face detection pass, and the transcript segments that overlap its timecode range.

This keeps compute reasonable and mirrors how editors actually think — in shots, not frames.

The Three Models

All open-source, all running locally on a MacBook Air (M3). No cloud inference.

CLIP (ViT-B-32) — The Swiss Army knife. Embed images and text into the same vector space, then compare. "Street scene" finds street scenes. "Green tones" finds green-graded shots. "Credits" finds credits. One model, endless queries.

Enrichment: ~30ms per shot (single thumbnail)

Whisper (base) — Speech to timestamped transcript. Runs on the full audio track, then segments are linked to shots by timestamp overlap.

Enrichment: ~45s for the full 12-minute video

ArcFace (buffalo_l via InsightFace) — Face detection and embedding on the representative thumbnail. Click a face, find all other shots with that person. No identification needed, just clustering by visual similarity.

Enrichment: ~100ms per shot (single thumbnail)

That's it. Three models. The magic is in how they combine.

Why Stacking Works So Well

Each filter alone returns "roughly right" results. But stack two or three and the precision jumps dramatically.

Example workflow from the demo:

Search "street scene, couple" → ~15 shots
Click "Match" on an interesting frame → visually similar shots
Click a face → only shots with that character
Add "sorry" to dialog search → 2 results, both exactly right

Each step cuts the noise. By the end, you're looking at exactly what you wanted.

The same principle works for color. CLIP understands "green tones" or "warm sunset" without any separate color extraction. Add a face filter on top and you get "shots of this character in warm lighting." No custom code for that combination — it just falls out of the architecture.

The Architecture

Deliberately simple. ~500 lines of Python.

The key insight: PostgreSQL + pgvector stores everything in one place. Embeddings, transcripts, face clusters, timestamps — all in the same database.

This means a single SQL query can:

Filter on metadata (timecode range, face cluster)
Rank by vector similarity (CLIP embedding distance)
Full-text search on transcripts

No need to query multiple systems and merge results. One query, one round trip.

SELECT scene_index, timestamp, filepath,
       1 - (clip_embedding <=> $1) as similarity
FROM scenes
WHERE ($2 IS NULL OR face_cluster_id = $2)
  AND ($3 IS NULL OR transcript ILIKE '%' || $3 || '%')
  AND timestamp BETWEEN $4 AND $5
ORDER BY clip_embedding <=> $1
LIMIT 50;

That's filtering by face, dialog, and timecode while ranking by visual similarity — in one query.

The Claude Code Part

I'll be honest: this prototype exists because of Claude Code with Opus 4.5.

It wasn't a "write me a video search app" one-shot. It was a day-long collaboration:

I described the architecture and models I wanted
Claude scaffolded the project structure
I'd test, hit issues, describe what was wrong
Claude would debug, refactor, improve
Repeat

The iteration speed is what made this feasible. In one day I went from "I wonder if this would work" to a working demo I'm proud to show people. That used to be a week of wrestling with documentation and Stack Overflow.

The code isn't perfect. There are rough edges. But the core insight — stacking small models — is validated. That's what a prototype is for.

What's Next

This was 12 minutes of footage. Not a scale test. But the results are promising enough that I want to push further:

More models: Object detection (YOLO/SAM), OCR for on-screen text, audio classification
Longer content: Feature-length films, dailies from real productions
NLP layer: Parse "outdoor shots with the main character talking about technology" into structured filters

Each additional model adds another dimension to filter on. The architecture supports it — just add another column and another filter clause.

The Takeaway

If you're building search over any media type, don't sleep on model stacking. A 512-dim CLIP embedding, a transcript, and a face cluster ID — three simple signals — combine into search that feels intelligent.

The models are all open-source. The infrastructure is Postgres with an extension. The frontend is vanilla HTML/JS. None of this is exotic.

The magic is in the combination.

Pac-Man, Shakey the Robot, and Von Neumann Walk Into a Maze

Jason Peterson — Sun, 30 Nov 2025 14:10:20 +0000

Seeing Google's Doodle tribute to Pac-Man recently sent me spiraling down a nostalgia rabbit hole. It reminded me of the 80s and scrounging for quarters and the disappointment over how long I could make each last—and of a MOOC I took in what now feels like ancient history in AI terms. A Gemini search of my old Gmail archives confirms it was UC Berkeley CS 188: Introduction to Artificial Intelligence via EdX, probably around 2012.

The year 2012. The year Obama won re-election, the Mayan calendar "ended," and Honey Boo Boo was somehow a cultural phenomenon. Siri was barely a year old—a killer feature for iPhone sales despite rarely understanding what you asked. "Deep learning" was a phrase you'd find only in academic papers, not headlines. A different era entirely.

That course required implementing classic algorithms like Greedy search and Minimax into Pac-Man. It was, honestly, a bit over my head at the time. But I slogged through it, and somewhere between debugging Python at 2 AM and watching my AI-controlled Pac-Man successfully evade ghosts, my mind was blown.

These "simple" algorithms—Minimax especially—produced results that felt... intelligent. Surprising. Alive. Before that course, I'd thought of code as scripts: wholly deterministic sequences that did exactly what you told them. But watching Minimax navigate a maze, weighing possibilities, anticipating ghost movements—that was something else entirely.

Why Revisit This Now?

We're deep into the post-deep-learning, post-generative AI era. Claude Code, Antigravity, and OpenAI Codex write and debug code autonomously. Suno-generated songs are topping Billboard's country charts. Google's Genie 2 conjures playable 3D worlds from a single image. So why bother with these dusty old algorithms from the 1940s and 1980s?

Because they're not obsolete. Far from it.

These algorithms are embedded everywhere: in your GPS routing, your game opponents, your thermostat, the trading bots on Wall Street. They run in parallel with the more advanced forms of AI now upon us—faster, smaller, and often more appropriate for the task at hand.

And honestly? In Pac-Man—a game of perfect information—we don't need generative AI to move our yellow hero. An untuned LLM could probably play Pac-Man, sure, but with noticeable latency and probably not with better performance than good old Minimax.

What's Different Now

Here's what has changed: the barrier to experimentation has collapsed completely.

Claude Opus 4.5 basically one-shotted a modified Pac-Man clone that lets users experiment with different AI algorithms—from the hilariously dumb (Random, Greedy) to the surprisingly sophisticated (A*, Minimax).

Seriously, we live in amazing times.

Afterwards, I modified it using GitHub Copilot with more Opus 4.5 help—adding a benchmarking mode and various quality-of-life improvements.

Pac-Man with various brains

Try the demo here →

We'll explore each algorithm with animated GIFs as we go.

First, Let's Talk About the Ghosts

Before we dive into Pac-Man's AI options, we need to understand his adversaries. Because here's the thing: the ghosts are living in the 1980s.

Blinky, Pinky, Inky, and Clyde don't use fancy pathfinding or machine learning. They use the exact same logic Toru Iwatani and Shigeo Funaki coded in 1980—simple targeting rules that, combined, create the illusion of intelligent pursuit.

Photo: Torben Friedrich, CC BY-SA 4.0

The Ghost Ensemble

Each ghost has a distinct personality encoded in just a few lines of code:

// Blinky (Red) - Direct chase
blinky: function(map, ghostPos, pacmanPos) {
    return this.moveToward(map, ghostPos, pacmanPos);
}

Blinky (Red) is pure Greedy algorithm—he targets Pac-Man's current position. Simple, relentless, predictable.

Pinky (Pink) is the ambusher—she targets 4 tiles ahead of Pac-Man, trying to cut him off:

// Pinky targets 4 tiles ahead of Pac-Man's direction
pinky: function(map, ghostPos, pacmanPos, pacmanDir) {
    var delta = deltas[pacmanDir] || {x: 0, y: 0};
    var target = {
        x: pacmanPos.x + delta.x * 4,
        y: pacmanPos.y + delta.y * 4
    };
    return this.moveToward(map, ghostPos, target);
}

Inky (Blue) uses vector math—he calculates his target as a reflection of Blinky across Pac-Man's position:

// Inky - Target is reflection of Blinky across Pac-Man
inky: function(map, ghostPos, pacmanPos, blinkyPos) {
    var target = {
        x: pacmanPos.x + (pacmanPos.x - blinkyPos.x),
        y: pacmanPos.y + (pacmanPos.y - blinkyPos.y)
    };
    return this.moveToward(map, ghostPos, target);
}

Clyde (Orange) is the coward—he chases when far away but retreats to his corner when he gets within 8 tiles:

// Clyde - Chase if far, scatter if close
clyde: function(map, ghostPos, pacmanPos) {
    var dist = manhattanDistance(toGrid(ghostPos), toGrid(pacmanPos));
    if (dist > 8) {
        return this.moveToward(map, ghostPos, pacmanPos);
    } else {
        // Scatter to corner
        return this.moveToward(map, ghostPos, {x: 0, y: 210});
    }
}

The Math Glitch That Became Canon

Fun fact: there's a bug in Pinky's original code. When Pac-Man faces UP, an overflow error adds an extra offset to the LEFT, making her targeting slightly... twitchy. Namco discovered this but kept it—it made Pinky feel more unpredictable, more alive. Sometimes bugs are features.

Why This Matters for Our AI

The ensemble of ghost behaviors is better than any individual ghost. They flank, they scatter, they reconverge. But they're also predictable in their unpredictability—which means our Minimax algorithm, which assumes optimal adversarial play, is actually playing against a weaker opponent than it thinks. More on that irony later.

Benchmarking the Algorithms

Let's see how our AI options actually perform. Run the benchmark mode yourself—cuz it's fun to watch 400 game starts play out in a minute or so! But here's what my results looked like:

Rank	Algorithm	Survival	Avg Score	Ghosts Eaten	Decision Time
🥇	Minimax	31.9s	1687	5.5	0.04ms
🥈	A* Pathfinding	17.5s	1114	3.4	0.01ms
🥉	Random	7.1s	368	0.6	0.00ms
4	Greedy	1.0s	70	0.0	0.01ms

Wait—Random beats Greedy? And by a lot?

Let's dig into each algorithm, from worst to best. Fair warning: John von Neumann is going to pop up a lot. So is the Cold War.

4th Place: Greedy — The Algorithm That Should Know Better

It's no good being greedy!

The History

The term "Greedy algorithm" emerged from the optimization boom of the 1950s and 60s, solidified by work on Matroids by Jack Edmonds in the 1970s. The concept is beautifully simple: always take the locally optimal choice and hope it leads to a globally optimal solution.

For some problems (like Dijkstra's shortest path), greedy works perfectly. For others (like Pac-Man survival), it's a disaster.

Why It Fails Spectacularly

Our Greedy implementation does exactly one thing: find the nearest pellet and move toward it.

greedy: function(map, pacmanPos, lastMove) {
    var nearest = findNearestPellet(map, pacmanGrid);
    // ...find the move that gets us closest to that pellet
    var dist = manhattanDistance(newPos, nearest);
    var score = -dist;
}

Notice what's missing? Any awareness of ghosts whatsoever.

Greedy Pac-Man will cheerfully walk straight into Blinky's open maw if there's a pellet on the other side. It's the algorithmic equivalent of texting while crossing the street.

Greedy's fatal flaw: pellet tunnel vision

Average survival: ~1 second. Ouch.

3rd Place: Random — The Drunken Master

Every direction is equally valid. That's the whole strategy.

The Ancient History

Random number generation is older than computing itself. Dice from 2400 BC. I Ching stalks from 1100 BC China. Heated tortoise shells whose cracks were interpreted as divine messages.

For millennia, generating randomness required physical hardware: dice, coins, shuffled cards. But computers are deterministic machines—how do you generate chaos from clockwork?

Enter Von Neumann (First Appearance!)

During the Manhattan Project, Stanislaw Ulam invented the Monte Carlo method while playing solitaire during recovery from brain surgery. (The best algorithms are born from boredom.) He and von Neumann needed random numbers—lots of them—to simulate neutron diffusion in fissile material.

Von Neumann's solution was the Middle-Square Method (c. 1946):

Take a seed number
Square it
Extract the middle digits
That's your random number (and your new seed)

Von Neumann famously acknowledged the philosophical absurdity: "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."

But sinful or not, it worked well enough to help design the hydrogen bomb.

Why Random Beats Greedy

random: function(map, pacmanPos, lastMove) {
    var moves = getValidMoves(map, pacmanPos);
    var bestMove = moves[0];
    var bestScore = -Infinity;

    for (var i = 0; i < moves.length; i++) {
        var move = moves[i];
        var score = Math.random() * 10 - getOscillationPenalty(pacmanPos, move, lastMove);
        if (score > bestScore) {
            bestScore = score;
            bestMove = move;
        }
    }
    return bestMove;
}

Random has one crucial advantage over Greedy: it doesn't walk into the same trap twice.

By moving unpredictably, Random occasionally stumbles away from danger. It picks up pellets by accident. It sometimes, by pure chance, threads through a gap between ghosts that Greedy would have sprinted directly into.

It's the Drunken Master of algorithms—surviving not through skill but through the sheer improbability of its movements.

The Drunken Master in action

Average survival: ~7 seconds. Seven times better than Greedy!

2nd Place: A* — Shakey's Gift to Gaming

The Robot That Started It All

In the late 1960s, at Stanford Research Institute, a robot named Shakey wobbled through rooms full of blocks and ramps—the first mobile robot capable of reasoning about its actions. It needed to navigate without crashing, but Dijkstra's algorithm (1956) explored in all directions like spilling water. Shakey needed to head toward its goal.

Photo: SRI International, CC BY-SA 3.0

Peter Hart, Nils Nilsson, and Bertram Raphael solved this by adding a heuristic—an estimate of remaining distance: f(n) = g(n) + h(n), balancing known cost with estimated future cost.

Smart A* in Pac-Man

Our A* implementation isn't just vanilla pathfinding—it has strategic goal selection:

// Determine goal priority:
// 1. If ghosts are edible -> hunt nearest ghost
// 2. If dangerous ghost is close -> go for power pellet
// 3. Otherwise -> collect regular pellets

if (edibleGhosts.length > 0) {
    // Hunt nearest edible ghost
    goalGrid = nearestGhost;
} else if (closestDangerDist < 8) {
    // Ghost is close! Get a power pellet
    goalGrid = findNearestPowerPellet(map, pacmanGrid);
} else {
    // Safe - collect regular pellets
    goalGrid = findNearestRegularPellet(map, pacmanGrid);
}

It also incorporates danger costs—paths near ghosts have higher penalties. A* doesn't just find the shortest path—it finds the safest short path.

Methodical, efficient, but not paranoid enough

Average survival: ~17.5 seconds. Solid, reliable performance.

1st Place: Minimax — The Cold War Logic of a Paranoid Yellow Circle

Methodical, efficient, but not paranoid enough

Poker, Paranoia, and the Bomb

To understand Minimax, you need to understand John von Neumann.

Born in Budapest in 1903, von Neumann was a prodigy among prodigies—part of the legendary "Hungarian Martians" who reshaped American science. At Princeton's Institute for Advanced Study, he became fascinated with a question that standard probability couldn't answer:

How do you optimize against an opponent who's actively trying to beat you?

Roulette wheels don't care if you lose. But poker players bluff. They hide information. They maximize their gain at your expense.

In 1928, von Neumann published "Theory of Parlor Games," proving the Minimax theorem: in a zero-sum game with perfect information, there exists a strategy that minimizes your maximum possible loss.

In other words: assume your opponent is a genius. Assume they'll always make the move that hurts you most. Then pick the move that leaves you least hurt in that worst case.

From Poker to Armageddon

Von Neumann wasn't just an academic—he worked on the Manhattan Project and later served on the Atomic Energy Commission. His Minimax philosophy permeated Cold War strategy, underpinning Mutually Assured Destruction (MAD): create a situation where the opponent's "maximum loss" is so unacceptable they're forced into strategies that minimize it.

Minimax in Pac-Man

Our implementation builds a game tree to a depth of 6, alternating between MAX layers (Pac-Man maximizes) and MIN layers (ghosts minimize). The evaluation function weighs ghost proximity (heavily penalized), edible ghost hunting (heavily rewarded), and power pellet value (boosted when threatened). Alpha-beta pruning keeps it fast enough for real-time play.

The Irony of Perfection

Here's the beautiful irony: Minimax assumes the ghosts are perfect adversaries. It prepares for opponents who will always make the optimal move against Pac-Man.

But the ghosts aren't perfect. They're not even good. They're hard-wired, animatronic patterns from 1980—Blinky's relentless chase, Pinky's buggy ambush, Inky's confusing flanks, Clyde's cowardly retreats.

Minimax is playing 4D chess against opponents who are playing checkers.

The result? Minimax often seems overly cautious—dodging threats that aren't really threats, optimizing against a genius adversary that doesn't exist. It's the Cold War logic of the bomb applied to a yellow circle eating dots: paranoid, calculating, and ultimately... over-prepared for an apocalypse that was never coming.

Paranoid, calculating, victorious—assuming an optimal opponent!

Average survival: ~32 seconds. More than double A*'s performance.

The Genealogy of Code

Algorithm	Era	Origin Story	Core Philosophy
Greedy	1950s-60s	Optimization research, Dijkstra, Kruskal, Prim	"Take the best immediate option"
Random	Ancient / 1946	Dice, I Ching → Monte Carlo, von Neumann's Middle-Square	"When in doubt, roll the dice"
A*	1968	Shakey the Robot at SRI	"Balance known cost with estimated future"
Minimax	1928 / Cold War	Von Neumann's poker games → nuclear strategy	"Assume the worst, prepare accordingly"

These algorithms aren't just code—they're artifacts of human history. Von Neumann's paranoia about Soviet intentions lives on in every game tree search. Ulam's boredom during convalescence echoes in every random number generator. Shakey's wobbling navigation through a room of blocks enabled every pathfinding algorithm in every video game since.

Try It Yourself

Play the demo →

Run the benchmark. Watch Random stumble to third place. Watch Greedy die immediately. Watch A* navigate with mechanical efficiency. Watch Minimax dominate with Cold War paranoia.

Want to go further? The code is begging for improvements:

Deepen the Minimax search beyond 6 plies—how much better can it get?
Implement Expectimax, which models the ghosts as probabilistic rather than optimal adversaries
Encode actual ghost behaviors into the evaluation function—if you know Clyde retreats at 8 tiles, exploit it
A hundred lines of well-tuned algorithm could absolutely wipe the floor with these 1980s ghosts

Then think about this: I built this entire demo in an afternoon, mostly through conversation with an LLM. The barrier to experimenting with classic computer science has never been lower.

We stand on the shoulders of giants—von Neumann, Ulam, Dijkstra, Hart, Nilsson, Raphael, Iwatani. Their algorithms, forged in the crucible of war and the whimsy of arcade entertainment, are now accessible to anyone with a browser.

Use them. Learn from them. Build on them.

A Final Note on Quarters

Remember those quarters I mentioned scrounging for in the 80s? Here's where they went.

Photo: Official GDC, CC BY 2.0

Toru Iwatani, the designer who gave us Pac-Man, never received more than his regular Namco salary for creating one of the most successful games in history. No royalties, no bonus, no equity stake. Just a paycheck.

And Namco's revenue model was gloriously simple: they sold arcade cabinets. Whole machines, turnkey systems. A Pac-Man cabinet cost around $2,500 in 1980 (about $9,500 today)—at a busy arcade, it could pay for itself in a month. The buyer paid Namco upfront, plugged the cabinet in, and kept every quarter that dropped. No subscriptions, no microtransactions, no licensing deals (those came later). Just hardware for cash.

A simpler era—in algorithms and in business.

Credits

Original Pac-Man game design: Toru Iwatani, Namco (1980)
Ghost AI logic: Shigeo Funaki, based on Iwatani's personality concepts
Base JavaScript Pac-Man engine: Adapted from daleharvey/pacman
AI implementations and modifications: Built with Claude Opus 4.5 and GitHub Copilot
Historical research: Google Deep Research, compiled from various sources on von Neumann, the Manhattan Project, SRI's Shakey project, and Namco's development history
Diagrams: Generated with Gemini Nano Banana Pro

Building a Rubik's Cube Solver: A Primer on Claude Skills

Jason Peterson — Fri, 14 Nov 2025 15:19:20 +0000

Got a scrambled Rubik's Cube gathering dust on your desk? I did too, which prompted me to build a Claude Skill that solves it from six photos. Take pictures of each face, and Claude analyzes them, validates the cube state, and returns step-by-step solving instructions.

This project became a perfect way to understand how Claude Skills work—and discover some surprising aspects of building AI-powered workflows.

What Are Claude Skills?

Claude Skills are procedural workflows that extend Claude's capabilities by combining its natural strengths (vision, reasoning, conversation) with external tools like Python scripts, APIs, or command-line utilities.

The heart of a Claude Skill is a SKILL.md file—a markdown document containing step-by-step instructions that Claude follows. Think of it as a playbook that tells Claude how to orchestrate a complex task from start to finish.

Why does this matter? Skills are repeatable, shareable, and specialized. You can build a workflow once, package it with supporting scripts, and anyone can use it conversationally through Claude. No API wrangling, no UI to build—just describe the procedure, and Claude handles the orchestration.

Anatomy of a Claude Skill (Using the Rubik's Solver)

Let's walk through how the Rubik's Cube solver works to see the components in action:

The SKILL.md file contains procedural instructions like:

### Step 1: Request Photos
Request exactly 6 photos from the user, one of each cube face
IN THIS SPECIFIC ORDER with correct orientation.

**Photo 1: White face (center) - Hold cube with BLUE on top**
**Photo 2: Orange face (center) - Hold cube with WHITE on top**
...

Python scripts handle the computational heavy lifting:

solve_cube.py validates color sequences, renders visualizations, and runs the Kociemba solving algorithm
Each face gets cached after validation
The solver concatenates all faces and computes the optimal solution

Claude's orchestration ties it all together:

Analyzes photos using vision capabilities to extract the 9-sticker color sequence for each face
Calls validation scripts with the detected colors
Renders an emoji-based cube visualization for user confirmation
Handles errors, corrections, and edge cases conversationally
Delivers the final step-by-step solving instructions

The workflow looks like this:

Request 6 photos (with specific orientation instructions)
Claude analyzes each photo → extracts 9-sticker color sequence
Python validates and caches each face
Renders emoji visualization for user confirmation
Concatenates → Solves → Returns instructions

The skill in action: Claude guides the user through photo capture, validation, and solving

Things That Might Surprise You as a Developer

Building this skill revealed some interesting characteristics of the Claude Skills environment:

Dependencies Are Casual

You just mention pip3 install kociemba in your SKILL.md file, and Claude installs it—every time you run the skill. There's no Docker image, no virtual environment, no persistent package state to manage. The configuration is completely stateless.

This feels weird at first. Where's my requirements.txt? What about version pinning? But it's also liberating: your skill is just instructions plus scripts. Claude handles the environment setup each time.

Everything Runs Remotely

Claude executes your scripts on a remote server, not your local machine. Your skill package is just a set of instructions and supporting files that Claude interprets and runs in its environment. You're not SSH-ing into a box or deploying to infrastructure—you're describing a workflow, and Claude makes it happen.

Iteration Takes Patience

Here's the practical friction: Claude Code can't directly access image files uploaded in chat during skill development. My workaround was zipping the skill contents (SKILL.md, scripts, etc.) and uploading to Desktop for testing.

Testing also means going through the full photo upload flow each iteration. It's more friction than local development with instant feedback, but manageable once you build a rhythm. Think of it as similar to testing a mobile app—you need to go through the actual user flow.

The Vision Challenge

One surprise: computer vision to color string conversion isn't 100% reliable, even with clear, well-lit photos. Claude sometimes misreads sticker colors. That's why the skill includes a validation step—rendering an emoji visualization for the user to confirm.

This is still dramatically easier than the old way: coaxing OpenCV into reliable color detection with threshold tuning, color space conversions, lighting normalization, and endless edge case handling. I'll take "human confirms the visualization" over "debug CV pipeline for three hours" any day.

Skills Are Non-Deterministic

Here's what really impressed me: Skills aren't just rigid instruction-following. Claude doesn't execute your SKILL.md robotically.

When things go wrong—a misoriented photo, unclear lighting, user uploads faces out of order—Claude adapts. It will valiantly work to reach a valid solver state, handling scenarios I didn't anticipate when designing the workflow. The conversational nature means graceful degradation, not hard failures.

If a photo doesn't make sense, Claude asks clarifying questions. If the solver fails, it walks the user through corrections. This resilience comes "for free" from Claude's reasoning capabilities layered on top of your procedural instructions.

Following Claude's step-by-step instructions to solve the cube

Why Solutions Are Always ~20 Moves or Less

The Kociemba algorithm uses a two-phase approach that's proven to solve any valid cube state in ~20 moves or fewer (often much less). Unlike beginner methods that solve layer-by-layer, or advanced methods like CFOP that optimize for speed, Kociemba finds mathematically near-optimal solutions:

Phase 1: Getting the cube into a specific subset of positions (G1 group)
Phase 2: Solving from that subset to completion

This approach sacrifices execution speed (the moves aren't optimized for finger tricks) but guarantees remarkably short solutions. A beginner method might take 80-100 moves; CFOP averages 50-60. Kociemba typically delivers solutions in 18-22 moves, making it ideal for casual solving where you want minimal steps, not maximum speed.

Try It Yourself & Help Me Improve

The skill is open source on GitHub. It works with any standard 3×3 Rubik's Cube, and the only requirement is the kociemba library (which Claude auto-installs).

How to Install:

Clone the repository:

   git clone https://github.com/JasonMakes801/rubiks-cube-solver.git
   cd rubiks-cube-solver

Create a zip file of the skill:

   zip -r rubiks-cube-solver.zip SKILL.md scripts/ README.md

Install in Claude Desktop:
- Open Claude Desktop (or Claude Code)
- Upload the rubiks-cube-solver.zip file to chat
- Ask Claude to "extract this skill and help me solve my Rubik's Cube"
- Claude will unpack the skill and begin the photo-guided workflow
Usage:
- Have your scrambled Rubik's Cube ready
- Prepare to take 6 clear photos (one per face)
- Follow Claude's orientation instructions carefully
- Review the emoji visualization to confirm colors
- Follow the step-by-step solving instructions

One Idea I'm Considering:

Can we eliminate the human-in-the-loop validation step?

Is there a way to build error correction into the scripts themselves—maybe cross-referencing impossible color combinations, using multiple validation passes, or applying constraint satisfaction logic—so the vision-to-string conversion becomes reliable enough to skip user confirmation?

If you have ideas on this or other improvements, I'd love to hear them. Open an issue or PR, or just share your thoughts.

Takeaway

Claude Skills unlock a new pattern: conversational workflows orchestrating specialized tools. This approach works far beyond Rubik's Cubes—data analysis pipelines, image processing workflows, API integrations, code generation tasks.

The barrier to building custom AI workflows just got a lot lower. You don't need to build UIs, manage API keys, or handle state management. Just describe the procedure in markdown, provide the tools, and Claude handles the orchestration conversationally.

What will you build?

Testing AGENTS.md Across Three Agentic Coding Platforms: Universal Context Has Arrived

Jason Peterson — Tue, 21 Oct 2025 03:38:48 +0000

Most developers I know are loyal to their agentic coding platform. You're a Copilot person, a Claude Code person, etc. That made sense when each required its own special way of managing context.

But AGENTS.md is quietly changing that equation. It's a universal context standard that works across GitHub Copilot, Claude Code, Gemini CLI, OpenAI Codex, and others. Write your project spec once, use any platform that fits the moment.

I tested this by building the same application three different ways. Here's what I learned about the current state of agentic coding and why your workflow might benefit from a multi-platform approach.

What is AGENTS.md?

AGENTS.md is a standardized markdown file that provides context to AI coding assistants. Think of it as a project brief that lives in your repository: requirements, technical specifications, coding preferences, architectural decisions, and context an AI needs to work effectively.

What makes it useful:

Universal standard: Works across GitHub Copilot, Claude Code, Gemini CLI, OpenAI Codex, and other AI coding tools
Plain markdown: No special syntax required
Persistent context: The AI reads it each time, so you're not re-explaining your project

What goes in it:
Project overview, technical requirements, file structure, coding standards, dependencies, and setup instructions.

Where it lives:
Place it in your project root directory. Some platforms support AGENTS.md files at multiple levels for more granular context.

Platform-specific notes:
GitHub Copilot also supports Instructions.md at various levels, but AGENTS.md works universally. Claude Code and Gemini CLI both use AGENTS.md as their primary context source.

The key advantage: write your context once, and multiple AI coding tools can use it.

The Experiment

For this experiment, I needed a project complex enough to stress-test these tools: Conway's Game of Life with real-time pattern recognition (to make the coding challenge a bit harder) and a retro arcade aesthetic. The AGENTS.md specification was 2,000 words covering the cellular automaton logic, visual effects (CRT scanlines, glow), and automatic detection and color-coding of emergent patterns like gliders, oscillators, and still lifes.

Same spec. Three platforms:

GitHub Copilot with GPT-5 (my daily driver, typically with Claude Sonnet 4.5)
Claude Code (Anthropic's command-line coding agent)
Gemini CLI (Google's terminal-based coding tool)

I ran each from a clean slate, pointing them at the same AGENTS.md file. No hand-holding, no iterative fixes, just one shot to see what each would build.

What Actually Happened

All three tools produced working implementations. But the approaches, results, and developer experience differed in revealing ways.

Claude Code: The Planner

Claude Code paused before writing code. It read the specification, presented a detailed roadmap of what it intended to build (file structure, implementation approach, feature priorities), then required my approval before proceeding.

This felt collaborative. Less "AI does the thing" and more "AI proposes a plan, human signs off."

The result? The most polished one-shot implementation. Pattern recognition worked correctly, visual effects were solid, code was well-structured. It felt production-ready.

Gemini CLI: The Honest Craftsman

Gemini got close. The implementation was visually true to the requested aesthetic. But it was upfront about not being finished: "Next, I will focus on enhancing the pattern detection to recognize more complex patterns like gliders and other oscillators, as specified in the project requirements."

I appreciated the honesty. It delivered something genuinely good while acknowledging where it fell short of the spec. The transparency felt valuable.

GitHub Copilot + GPT-5: The Capable Generalist

Copilot produced a solid implementation quickly. The game worked, the retro aesthetic was there, the code was clean. But pattern recognition (specifically the color-coding of oscillators) didn't quite work as specified.

Not broken, just incomplete on one of the core features. Still impressive, just not as polished as Claude Code's output.

The Objective Analysis

I didn't want this to just be my opinion. So I had Grok Code Fast 1 conduct a blind code review of all three implementations.

I gave Grok the AGENTS.md specification and all three complete codebases. No context about which tool built which. Just: evaluate these against the spec.

Claude Code: 9/10
• Pattern Recognition: Excellent (gliders in all 4 orientations, multiple still lifes, oscillators)
• Advanced Features: Afterglow trails, extinction alerts, stable pattern detection ✓
• Visual Polish: Full retro arcade UI with CRT scanlines and legend ✓
• Weaknesses: Missing LWSS spaceship detection; potential performance lag in dense grids

GitHub Copilot + GPT-5: 9/10
• Pattern Recognition: Strong (gliders in all 4 orientations, LWSS spaceship, oscillators, still lifes)
• Advanced Features: Scanlines, vignette, vector-style glow, stability alerts ✓
• Visual Polish: Balanced retro aesthetic with optional FPS display ✓
• Weaknesses: Oscillator detection relies on state comparison, potentially missing edge cases

(I'd give it an 8, that 9 is a bit generous to my mind.)

Gemini CLI: 6/10
• Pattern Recognition: Limited (only block still life and horizontal blinker)
• Advanced Features: Basic trail effects for dead cells
• Visual Polish: Clean, functional UI with retro styling ✓
• Weaknesses: Severely limited pattern detection (misses gliders, spaceships, most oscillators); no stability/extinction detection

The Workflow Insight

Beyond the scores, this experiment revealed something practical: using multiple AI coding tools on the same project is now genuinely viable and maybe even optimal.

Both Claude Code and Gemini CLI install via Homebrew on Mac (brew install claude-code / brew install gemini-cli), which makes experimentation trivially easy. Both also make your terminal look fantastic, which shouldn't matter but somehow does.

The real insight: if you're already using Copilot in VSCode, you'd be missing an opportunity not to open a Terminal pane and occasionally run Claude Code or Gemini CLI for a second opinion. Both tools will read your AGENTS.md file for context. You're not starting over. You're getting a different perspective on the same project.

The AGENTS.md file makes this seamless. One specification, multiple tools that can execute against it, for those times when one agent gets stuck on a hard problem.

What This Means

We're at an interesting moment with AI-assisted development. These tools aren't experimental anymore. They're genuinely capable. Claude Code delivered something close to production-ready code in one shot. Copilot's implementation was solid and reliable. Even Gemini, despite its pattern recognition gaps, built something functional and visually appealing, and I'm sure given a second shot, it would nail the pattern recognition.

The AGENTS.md standard makes it practical to use multiple tools without rewriting context each time. This isn't about abandoning your preferred assistant. It's about recognizing that different tools have different strengths. Claude Code's planning phase caught edge cases. Copilot's spaceship detection was more complete. Gemini's aesthetic choices were compelling even where its pattern detection fell short.

You don't need to pick one. The infrastructure for multi-tool workflows already exists.

Try It Yourself

All three implementations are available to explore:

The AGENTS.md file that powered all three is here.

If you're already using one AI coding assistant, consider experimenting with another. The barrier to entry is lower than you think, and the insights from seeing different approaches to the same problem are worth the fifteen minutes it takes to try.

Forem: Jason Peterson

I Built a 120-Image AI Influencer Pipeline for $4.80

From Tilly to Lilly

How It Works

The Results

The Cost and the Point

Bonus: They Move Now

Visual UIs Are Now Possible in MCP Servers

What Changed

How It Works

Server Side

Client Side

Key Gotcha: Dynamic Script Loading

Try It Yourself

What's Next

Did You Know CLIP Works as an AI Image Detector?

The Dataset

The Traditional Approach: FFT Analysis

The CLIP Approach

Why Does This Work?

You Don't Even Need a Classifier

What Does the Model See?

Limitations

Conclusion

From Prototype to Production: Building a Multimodal Video Search Engine

Try This

What's Under the Hood

Links

I Stacked 3 Small ML Models and Got Video Search That Feels Like Magic

The Setup

Shot Segmentation

The Three Models

Why Stacking Works So Well

The Architecture

The Claude Code Part

What's Next

The Takeaway

Pac-Man, Shakey the Robot, and Von Neumann Walk Into a Maze

Why Revisit This Now?

What's Different Now

First, Let's Talk About the Ghosts

The Ghost Ensemble

The Math Glitch That Became Canon

Why This Matters for Our AI

Benchmarking the Algorithms

4th Place: Greedy — The Algorithm That Should Know Better

The History

Why It Fails Spectacularly

3rd Place: Random — The Drunken Master

The Ancient History

Enter Von Neumann (First Appearance!)

Why Random Beats Greedy

2nd Place: A* — Shakey's Gift to Gaming

The Robot That Started It All

Smart A* in Pac-Man

1st Place: Minimax — The Cold War Logic of a Paranoid Yellow Circle

Poker, Paranoia, and the Bomb

From Poker to Armageddon

Minimax in Pac-Man

The Irony of Perfection

The Genealogy of Code

Try It Yourself

A Final Note on Quarters

Credits

Building a Rubik's Cube Solver: A Primer on Claude Skills

What Are Claude Skills?

Anatomy of a Claude Skill (Using the Rubik's Solver)

Things That Might Surprise You as a Developer

Dependencies Are Casual

Everything Runs Remotely

Iteration Takes Patience

The Vision Challenge

Skills Are Non-Deterministic

Why Solutions Are Always ~20 Moves or Less

Try It Yourself & Help Me Improve

How to Install:

One Idea I'm Considering:

Takeaway

Testing AGENTS.md Across Three Agentic Coding Platforms: Universal Context Has Arrived

What is AGENTS.md?