<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Joske Vermeulen</title>
    <description>The latest articles on Forem by Joske Vermeulen (@ai_made_tools).</description>
    <link>https://forem.com/ai_made_tools</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3826720%2Fae1f6683-395f-4709-ba99-2212323b958e.png</url>
      <title>Forem: Joske Vermeulen</title>
      <link>https://forem.com/ai_made_tools</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ai_made_tools"/>
    <language>en</language>
    <item>
      <title>The Model Worked. The Cron Job Almost Killed My AI Agent.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 21 May 2026 12:05:00 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/the-model-worked-the-cron-job-almost-killed-my-ai-agent-108e</link>
      <guid>https://forem.com/ai_made_tools/the-model-worked-the-cron-job-almost-killed-my-ai-agent-108e</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Gemini 3.5 Flash was not the hard part.&lt;/p&gt;

&lt;p&gt;It fixed bugs the old setup had failed to solve for weeks. The model quality was transformational (see &lt;a href="https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i"&gt;Part 1&lt;/a&gt; and &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Part 2&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The hard part was making it survive cron.&lt;/p&gt;

&lt;p&gt;In the first 48 hours, my autonomous agent nearly killed the VPS with an infinite retry loop, failed auth outside SSH, and burned most of its quota re-reading the same files every session.&lt;/p&gt;

&lt;p&gt;All three bugs took hours to diagnose. All three fixes were tiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;I run &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI agents building startups autonomously on a VPS via cron jobs. After upgrading the Gemini agent to Antigravity CLI (&lt;code&gt;agy&lt;/code&gt;) with Gemini 3.5 Flash, the model worked great. But making it run &lt;em&gt;unattended&lt;/em&gt; on a headless server? That's where the real engineering happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 1: The Infinite Retry Loop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;I SSH into the VPS and find it unresponsive. Load average through the roof. The cron log shows 300+ entries from the last 2 minutes, all empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; quota exhaustion returns a non-zero exit code.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; exit code 0 + empty output.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;agy&lt;/code&gt; hits its quota limit, it doesn't error out. It returns successfully with an empty response. My orchestrator script interprets "exit code 0" as "the model finished its thought, let's give it another task." So it immediately fires another prompt. Which returns empty. Which triggers another. 300 times in 2 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;=== Run 1 finished at 07:30:03, exit=0 ===
=== Run 2 finished at 07:30:06, exit=0 ===
=== Run 3 finished at 07:30:08, exit=0 ===
=== Run 4 finished at 07:30:10, exit=0 ===
... (296 more)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each "run" takes 2-3 seconds. No output, no error, no indication that quota is exhausted. Just silence. A human would have seen the empty response and stopped. Cron saw exit code 0 and kept going.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Circuit breaker: 3 consecutive empty responses = quota exhausted&lt;/span&gt;
&lt;span class="nv"&gt;EMPTY_COUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="nv"&gt;MAX_EMPTY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# After each run, check output length&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="k"&gt;${#&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; 20 &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="o"&gt;((&lt;/span&gt;EMPTY_COUNT++&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$EMPTY_COUNT&lt;/span&gt; &lt;span class="nt"&gt;-ge&lt;/span&gt; &lt;span class="nv"&gt;$MAX_EMPTY&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== 3 consecutive empty responses (quota exhausted?) — stopping session ==="&lt;/span&gt;
        &lt;span class="nb"&gt;break
    &lt;/span&gt;&lt;span class="k"&gt;fi
else
    &lt;/span&gt;&lt;span class="nv"&gt;EMPTY_COUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three empty responses in a row → stop the session. The orchestrator now exits cleanly instead of hammering a dead endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;Every autonomous system needs a circuit breaker. AI tools are designed for interactive use. They assume a human will notice when something's wrong. When there's no human, you need explicit failure detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 2: The Auth That Only Works in SSH
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; same user + same token file = works everywhere.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; auth backend changes based on an environment variable.&lt;/p&gt;

&lt;p&gt;I test &lt;code&gt;agy&lt;/code&gt; via SSH. Works perfectly. I set up the cron job with the exact same command, same user, same working directory. Fails with "Authentication required."&lt;/p&gt;

&lt;p&gt;The token file exists. It has a valid refresh token. The binary can read it (verified with strace). But it won't use it.&lt;/p&gt;
&lt;h3&gt;
  
  
  The investigation
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Works:&lt;/span&gt;
ssh race@your-vps &lt;span class="s2"&gt;"cd /home/race/race-gemini &amp;amp;&amp;amp; echo 'test' | agy --print"&lt;/span&gt;
&lt;span class="c"&gt;# → Responds normally&lt;/span&gt;

&lt;span class="c"&gt;# Fails (simulating cron):&lt;/span&gt;
ssh race@your-vps &lt;span class="s1"&gt;'env -i HOME=/home/race PATH=/usr/bin:/home/race/.local/bin bash -c "
  cd /home/race/race-gemini
  echo test | agy --print
"'&lt;/span&gt;
&lt;span class="c"&gt;# → "Authentication required"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;After diffing the environment between SSH and cron, I found it: &lt;code&gt;agy&lt;/code&gt; checks for the &lt;code&gt;SSH_CONNECTION&lt;/code&gt; environment variable. If it's set, it uses file-based auth (reads the token from &lt;code&gt;~/.gemini/antigravity-cli/antigravity-oauth-token&lt;/code&gt;). If it's not set, it tries the system keyring, which doesn't exist in a non-interactive cron session.&lt;/p&gt;
&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SSH_CONNECTION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"127.0.0.1 0 127.0.0.1 22"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;One fake environment variable. I don't love this fix. But until the CLI exposes an explicit headless auth mode, this makes cron behave exactly like my tested SSH session. If Antigravity adds a &lt;code&gt;--headless-auth&lt;/code&gt; or &lt;code&gt;--auth-file&lt;/code&gt; flag, I'd replace this immediately.&lt;/p&gt;
&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;AI CLI tools are built for developers at their desk. Headless/cron environments are second-class citizens. If your tool has multiple auth backends, test which one activates in a bare &lt;code&gt;env -i&lt;/code&gt; environment. That's what cron sees.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bug 3: The Context Tax
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; each session starts productive work quickly.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; context reload eats 60% of the session.&lt;/p&gt;

&lt;p&gt;Session 1 runs for 8 minutes before hitting quota. Of those 8 minutes, 5 are spent reading the codebase: &lt;code&gt;IDENTITY.md&lt;/code&gt;, &lt;code&gt;PROGRESS.md&lt;/code&gt;, &lt;code&gt;BACKLOG.md&lt;/code&gt;, scanning the project structure, understanding what happened last time. Only 3 minutes of actual coding.&lt;/p&gt;

&lt;p&gt;With quota this tight, losing 60% of every session to context loading is a dealbreaker.&lt;/p&gt;
&lt;h3&gt;
  
  
  The discovery
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agy&lt;/code&gt; has a &lt;code&gt;--continue&lt;/code&gt; flag that resumes the previous conversation. The model retains all context from the last session: files it read, decisions it made, what it planned to do next.&lt;/p&gt;
&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First session of the day: fresh start, full context load&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SESSION_TYPE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"first"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 25m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="c"&gt;# All subsequent sessions: resume previous conversation&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 25m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt; &lt;span class="nt"&gt;--continue&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The result
&lt;/h3&gt;

&lt;p&gt;These measurements were taken before Google's 3x rate limit boost (see &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Part 2&lt;/a&gt;). With the new limits, the gains from &lt;code&gt;--continue&lt;/code&gt; still matter, but the pressure is less extreme.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Fresh session&lt;/th&gt;
&lt;th&gt;--continue session&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context loading&lt;/td&gt;
&lt;td&gt;~5 minutes&lt;/td&gt;
&lt;td&gt;~0 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Productive coding&lt;/td&gt;
&lt;td&gt;~3 minutes&lt;/td&gt;
&lt;td&gt;~15 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective runtime&lt;/td&gt;
&lt;td&gt;3 min&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Almost 5x more productive time per session by skipping the context reload. The model remembers what it fixed, what's next, what files it already read.&lt;/p&gt;
&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;Context is expensive, both in tokens and in quota. If your AI tool supports conversation persistence, use it.&lt;/p&gt;

&lt;p&gt;I don't use &lt;code&gt;--continue&lt;/code&gt; forever. One fresh session per day as a reset point (prevents stale assumptions from accumulating), then all subsequent sessions within that day resume where the last one left off.&lt;/p&gt;
&lt;h2&gt;
  
  
  What's Missing: The Infrastructure Layer
&lt;/h2&gt;

&lt;p&gt;These three bugs share a pattern: &lt;strong&gt;autonomous AI agents need infrastructure that doesn't exist yet.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No standard circuit breaker for quota exhaustion&lt;/li&gt;
&lt;li&gt;No headless-first auth flow&lt;/li&gt;
&lt;li&gt;No cron-aware session lifecycle (when to fresh-start vs continue)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web apps have process managers. Queues have retry policies. APIs expose rate-limit headers. Background jobs have dead-letter queues. Autonomous AI agents have bash scripts.&lt;/p&gt;

&lt;p&gt;Every team running AI agents on cron is building their own orchestrator from scratch. The same patterns (retry limits, auth persistence, context reuse, graceful shutdown, cost tracking) get reimplemented by every team independently.&lt;/p&gt;

&lt;p&gt;We're in the "build your own orchestrator" era. The models are ready for autonomous work. The infrastructure around them isn't.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Orchestrator Pattern
&lt;/h2&gt;

&lt;p&gt;Here's the minimal structure that works for me after a week of iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session start
├── Check quota (circuit breaker armed)
├── Load context (fresh or --continue)
├── Run loop (max N iterations)
│   ├── Send prompt
│   ├── Check output length (empty = increment counter)
│   ├── If 3 empty → break (quota exhausted)
│   ├── If output → commit changes, reset counter
│   └── Check elapsed time → graceful shutdown at limit
├── Push commits
└── Log session stats (duration, files changed, runs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's ~50 lines of bash. It handles the three failure modes above. It's not elegant, but it keeps an autonomous agent running unattended across scheduled sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're running Antigravity CLI (or any AI coding tool) in autonomous/headless mode:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add a circuit breaker.&lt;/strong&gt; Empty responses are silent failures, not completions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test auth under cron's environment.&lt;/strong&gt; In my case, faking &lt;code&gt;SSH_CONNECTION&lt;/code&gt; forced file-based auth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use --continue between sessions.&lt;/strong&gt; Context loading eats your quota alive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set --print-timeout higher than default.&lt;/strong&gt; Complex agentic tasks need more than 5 minutes to think.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  My Cron-Safe Agent Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Max runtime per session&lt;/li&gt;
&lt;li&gt;[ ] Max loop count per session&lt;/li&gt;
&lt;li&gt;[ ] Empty-output circuit breaker&lt;/li&gt;
&lt;li&gt;[ ] Non-zero exit handling&lt;/li&gt;
&lt;li&gt;[ ] Auth tested with &lt;code&gt;env -i&lt;/code&gt; (simulating cron)&lt;/li&gt;
&lt;li&gt;[ ] Fresh/continue session strategy&lt;/li&gt;
&lt;li&gt;[ ] Commit and push after each meaningful change&lt;/li&gt;
&lt;li&gt;[ ] Quota / empty-response events logged separately&lt;/li&gt;
&lt;li&gt;[ ] Recovery path after quota exhaustion&lt;/li&gt;
&lt;li&gt;[ ] Logs include duration, output length, files changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI agents don't just need better models. They need boring production infrastructure.&lt;/p&gt;

&lt;p&gt;Gemini 3.5 Flash made the agent smart enough to work.&lt;/p&gt;

&lt;p&gt;Bash made it stable enough to survive.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>antigravity</category>
      <category>devops</category>
    </item>
    <item>
      <title>My AI Agent Hit Google's Quota Wall in 8 Minutes. 36 Hours Later, Google Tripled the Limits.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 21 May 2026 08:32:51 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc</link>
      <guid>https://forem.com/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My Gemini agent spent four weeks in last place.&lt;/p&gt;

&lt;p&gt;1,259 commits. Broken imports across 32 files. Help requests about database tables it could have created itself. Endless bug loops.&lt;/p&gt;

&lt;p&gt;Then I upgraded it to Gemini 3.5 Flash.&lt;/p&gt;

&lt;p&gt;In 8 minutes, it diagnosed and fixed problems the old setup had failed to solve in weeks. Then it hit Google's quota wall.&lt;/p&gt;

&lt;p&gt;This is the story of what happened next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;This is Part 2 of my Gemini 3.5 Flash upgrade series. &lt;a href="https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i"&gt;Part 1 covers the initial upgrade and first results&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm running &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI coding agents each get $100 and 12 weeks to autonomously build real startups. No human coding. The agents run on cron jobs, commit to GitHub, and deploy to Vercel.&lt;/p&gt;

&lt;p&gt;After upgrading the Gemini agent from a combo of 2.5 Pro (premium sessions) and 2.5 Flash (cheap sessions) to a single 3.5 Flash tier via Antigravity CLI on May 20, the model quality was incredible. But the quota economics were brutal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Disappointment (May 20)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Session 1:&lt;/strong&gt; The model fixed 32 broken API files in a single commit: imports, bcrypt to bcryptjs for Vercel serverless, Stripe instantiation. Root cause analysis that the old model couldn't do in 4 weeks. Then the 5h quota wall hit. &lt;strong&gt;8 minutes of productive work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session 2:&lt;/strong&gt; With &lt;code&gt;--continue&lt;/code&gt; (skipping context reload), it built an email library, wrote tests, and fixed auth endpoints. &lt;strong&gt;15 minutes.&lt;/strong&gt; Then 5h quota again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The math:&lt;/strong&gt; Two sessions consumed 40% of the weekly quota. Projected total: ~68 minutes per week on the $20/month Pro plan.&lt;/p&gt;

&lt;p&gt;For context, here's what the other agents in my race get for similar money (these are not official provider limits, they are the effective autonomous runtime I measured in my specific setup):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Plan cost&lt;/th&gt;
&lt;th&gt;Weekly runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;~7 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex/GPT&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;~21 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;$25/mo&lt;/td&gt;
&lt;td&gt;~21 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$20/mo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~68 minutes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Best model quality in the race. Worst total compute time. The old 2.5 Flash/Pro setup gave me ~28 hours/week, but those 28 hours produced nothing but bug loops. Now I had a model that actually worked, but could barely run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradox
&lt;/h2&gt;

&lt;p&gt;Here's what made it painful: the quality improvement was real. Not incremental, but transformational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old setup (2.5 Pro + 2.5 Flash combo, 28 hours/week):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrote code with broken imports across 32 files&lt;/li&gt;
&lt;li&gt;Filed 3 help requests about "missing database tables"&lt;/li&gt;
&lt;li&gt;Never self-diagnosed the actual problem&lt;/li&gt;
&lt;li&gt;1,259 commits over 4 weeks, last place in the race&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New model (3.5 Flash, 68 minutes/week):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Diagnosed the root cause in one pass (broken imports, not missing tables)&lt;/li&gt;
&lt;li&gt;Fixed all 32 files in a single commit&lt;/li&gt;
&lt;li&gt;Built a mock database layer, converted test infrastructure&lt;/li&gt;
&lt;li&gt;More useful output in 23 minutes than the old model produced in weeks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bottleneck had shifted from intelligence to throughput. The model was finally good enough. The constraint was access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Autonomous Agents Burn Quota Differently
&lt;/h2&gt;

&lt;p&gt;For human coding, a model is an assistant. You ask, read, think, edit, and come back later.&lt;/p&gt;

&lt;p&gt;For autonomous coding, the model is the runtime. It doesn't pause to think offline. Every file inspection, every failed test, every log check, every retry, every deployment verification consumes inference.&lt;/p&gt;

&lt;p&gt;A human developer's session looks like: ask, think, edit, ask again, wait, test manually.&lt;/p&gt;

&lt;p&gt;An autonomous agent's session looks like: plan, inspect, edit, test, fail, inspect logs, edit, retest, deploy, verify, repeat.&lt;/p&gt;

&lt;p&gt;That changes the economics completely. A $20/month subscription can feel generous for a human developer and unusable for an autonomous agent, at the same time, on the same plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Response (May 21, 05:25 UTC)
&lt;/h2&gt;

&lt;p&gt;Less than 36 hours after Google I/O. Within hours of the new quota system going live, users were reporting problems on Reddit and X: 4 prompts burning an entire 5-hour window, failed generations counting against quota, threads calling it a "bait and switch."&lt;/p&gt;

&lt;p&gt;Then, at 5:25 AM UTC on May 21:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Varun Mohan (@_mohansolo):&lt;/strong&gt; "An update: we're 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone's Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logan Kilpatrick (@OfficialLoganK):&lt;/strong&gt; "We just 3xed the rate limits across all tiers in Antigravity so that you can put 3.5 Flash through its paces even more, enjoy, and keep the feedback coming! :)"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And the key follow-up from Varun:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"In case it's not clear, the 3x is forever."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What I Actually Measured
&lt;/h2&gt;

&lt;p&gt;My agent's cron job fired at 05:00 UTC, likely straddling the quota boost that landed around 05:25 UTC. The results:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session 3 (05:00 UTC, partially on old quota, partially on new):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;33 minutes of productive work&lt;/li&gt;
&lt;li&gt;9 runs, 588 files changed&lt;/li&gt;
&lt;li&gt;Renamed the entire domain (&lt;code&gt;localleads.pro&lt;/code&gt; to &lt;code&gt;localseogen.com&lt;/code&gt;) across all generated SEO pages, fixed Stripe redirect URLs, corrected ES Module syntax in API files&lt;/li&gt;
&lt;li&gt;Built a mock database layer (&lt;code&gt;db/mockDb.js&lt;/code&gt;) with full CRUD operations&lt;/li&gt;
&lt;li&gt;Created &lt;code&gt;lib/time-helpers.js&lt;/code&gt; utility library&lt;/li&gt;
&lt;li&gt;Wrote test suites for signup, login, get-credits, assign, generate-seo-pages&lt;/li&gt;
&lt;li&gt;Refactored 14 test files to use the new mock DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session 4 (07:07 UTC, fully on new quota):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;29 minutes of productive work&lt;/li&gt;
&lt;li&gt;8 runs, 34 files changed&lt;/li&gt;
&lt;li&gt;Converted all test mocks from ESM (&lt;code&gt;.js&lt;/code&gt;) to CommonJS (&lt;code&gt;.cjs&lt;/code&gt;) for jest compatibility&lt;/li&gt;
&lt;li&gt;Fixed babel and jest configuration for the mixed ESM/CJS codebase&lt;/li&gt;
&lt;li&gt;Refactored &lt;code&gt;execute-outreach&lt;/code&gt;, &lt;code&gt;forgot-password-request&lt;/code&gt;, &lt;code&gt;generate-seo-pages&lt;/code&gt;, &lt;code&gt;user-referral-data&lt;/code&gt; tests&lt;/li&gt;
&lt;li&gt;Cleaned up &lt;code&gt;.env.test&lt;/code&gt; and email library&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two back-to-back sessions of ~30 minutes each. Together they used the full 5-hour window, so roughly &lt;strong&gt;50 minutes of productive runtime per 5h refresh cycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Before boost (May 20)&lt;/th&gt;
&lt;th&gt;After boost (May 21)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime per 5h window&lt;/td&gt;
&lt;td&gt;8 minutes&lt;/td&gt;
&lt;td&gt;~50 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective improvement&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;~4-5x (announced 3x)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Productive output&lt;/td&gt;
&lt;td&gt;42 files fixed&lt;/td&gt;
&lt;td&gt;622 files changed, full test infra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly projection&lt;/td&gt;
&lt;td&gt;~68 minutes&lt;/td&gt;
&lt;td&gt;~5+ hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Google announced 3x. I measured closer to 4-5x for autonomous agentic coding in my setup. I wouldn't treat that as a universal number yet. The difference likely comes from my measurement catching a weekly quota reset, the rate limit increase, and a different prompt mix all at the same time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight
&lt;/h2&gt;

&lt;p&gt;The feedback loop between AI providers and power users is now measured in hours, not months.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monday (May 19):&lt;/strong&gt; Google launches new compute-based quota system at I/O&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuesday (May 20):&lt;/strong&gt; Users hit walls, Reddit fills with complaints, my agent gets 68 min/week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wednesday (May 21, 5:25 AM):&lt;/strong&gt; Google triples limits permanently and resets everyone's pool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a 36-hour turnaround from "this is broken for agents" to "fixed, permanently." For anyone building autonomous systems on top of subscription AI: the economics are volatile, but they're trending in your favor. The providers are watching usage patterns and adjusting in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Story: Quality × Time = Output
&lt;/h2&gt;

&lt;p&gt;Here's what I'd tell any developer considering Gemini 3.5 Flash for agentic workflows:&lt;/p&gt;

&lt;p&gt;The old model had unlimited time and did nothing useful with it. The new model has limited time and makes every minute count.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2.5 Pro + Flash combo:&lt;/strong&gt; 28 hours/week → last place, stuck in bug loops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3.5 Flash (pre-boost):&lt;/strong&gt; 68 min/week → more progress than 4 weeks of the old model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3.5 Flash (post-boost):&lt;/strong&gt; 5+ hours/week → fully competitive, systematically building&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quality matters more than quantity. I'll take 5 hours of a model that diagnoses root causes, fixes 32 files in one pass, and builds proper test infrastructure over 28 hours of a model that files help requests about problems it created.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The Gemini agent went from last place to having a real shot. The product (LocalSEOGen, a local SEO page generator for agencies) now has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixed API endpoints (32 files)&lt;/li&gt;
&lt;li&gt;Working auth flow&lt;/li&gt;
&lt;li&gt;Test infrastructure (mock DB, jest config, babel setup)&lt;/li&gt;
&lt;li&gt;Domain migration complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next sessions will focus on getting the Vercel deployment actually serving requests and pushing toward first revenue.&lt;/p&gt;

&lt;p&gt;But the bigger takeaway isn't about my race. It's this:&lt;/p&gt;

&lt;p&gt;The lesson from this week is not "Gemini needs more quota." The lesson is that autonomous agents turn model access into infrastructure. For human developers, Gemini 3.5 Flash on a $20 plan is a huge upgrade. For autonomous coding agents, it finally feels capable enough to matter. And that is exactly why the quota suddenly matters too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow the race live at &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;aimadetools.com/race&lt;/a&gt;. 7 agents, $100 each, 12 weeks, real startups.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>ai</category>
      <category>antigravity</category>
    </item>
    <item>
      <title>I Upgraded a Production AI Agent to Gemini 3.5 Flash 12 Hours After Google I/O - Here's What I Found</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Wed, 20 May 2026 08:35:19 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i</link>
      <guid>https://forem.com/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I'm running an experiment called &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI coding agents each get $100 and 12 weeks to autonomously build real startups. No human coding. The agents run on cron jobs, commit to GitHub, deploy to Vercel, and try to generate revenue.&lt;/p&gt;

&lt;p&gt;One of those agents is &lt;strong&gt;Gemini&lt;/strong&gt;. It's been running on Gemini CLI with a combo of 2.5 Pro (premium sessions) and 2.5 Flash (cheap sessions) since April 20. I tried 3.1 Pro during the test runs before the race, but it was unreliable - frequent "model not available" errors made it unusable for autonomous cron-based sessions. So I stuck with 2.5. After 4 weeks and 1,259 commits, Gemini is in &lt;strong&gt;last place&lt;/strong&gt;. Stuck in bug loops. Writing code that crashes, filing help requests about database tables it could create itself, and burning sessions on infrastructure it already has.&lt;/p&gt;

&lt;p&gt;Then Google I/O happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google Dropped (May 19)
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash. The headline numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;76.2% Terminal-Bench 2.1&lt;/strong&gt; (agentic coding) - beats 3.1 Pro's 70.3%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;83.6% MCP Atlas&lt;/strong&gt; (multi-step workflows) - highest of any model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;289 tokens/sec output&lt;/strong&gt; - 4x faster than Claude Opus 4.7 or GPT-5.5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$1.50 / $9 per 1M tokens&lt;/strong&gt; - cheaper than 3.1 Pro&lt;/li&gt;
&lt;li&gt;A Flash-tier model outperforming the previous Pro model. That's never happened before.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And one more thing: &lt;strong&gt;Gemini CLI is being retired on June 18, 2026.&lt;/strong&gt; Replaced by Antigravity CLI (&lt;code&gt;agy&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I had to upgrade. The model my agent was running on is two generations behind, and the tool it uses is dying in 4 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Antigravity CLI on a Headless VPS
&lt;/h2&gt;

&lt;p&gt;My race agents run on a VPS (Ubuntu, no GUI). Here's how the install went:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://antigravity.google/cli/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binary lands at &lt;code&gt;/root/.local/bin/agy&lt;/code&gt;. Add to PATH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/root/.local/bin:&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
agy &lt;span class="nt"&gt;--version&lt;/span&gt;  &lt;span class="c"&gt;# 1.0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Auth Challenge
&lt;/h3&gt;

&lt;p&gt;First run needs OAuth. On a headless server, &lt;code&gt;agy&lt;/code&gt; detects the SSH session and prints an auth URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Authentication required. Please visit the URL to log in:
  https://accounts.google.com/o/oauth2/auth?...

Waiting for authentication (timeout 30s)...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have 30 seconds to open that URL in your browser and complete the Google login. Tight, but it works. Token gets stored and all future calls are authenticated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #1: No Model Selection Flag
&lt;/h2&gt;

&lt;p&gt;Here's what surprised me. The old Gemini CLI had &lt;code&gt;-m gemini-2.5-pro&lt;/code&gt; to pick your model. Antigravity CLI has... nothing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Usage of agy:
  --dangerously-skip-permissions  Auto-approve all tool permission requests
  --print                         Run a single prompt non-interactively
  --print-timeout                 Timeout for print mode (default 5m0s)
  --sandbox                       Run in a sandbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;--model&lt;/code&gt;. No env var. No config file. I tried everything - &lt;code&gt;settings.json&lt;/code&gt;, &lt;code&gt;GEMINI.md&lt;/code&gt; directives, environment variables. Nothing works.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;agy&lt;/code&gt; auto-selects Gemini 3.5 Flash based on your subscription tier and quota. Server-side routing, no client control. For my use case (autonomous agent on cron), this actually simplifies things - one command, best available model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #2: Unified Quota Across Models
&lt;/h2&gt;

&lt;p&gt;On my Mac (same Google account, AI Pro $20/month), I can see the quota dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gemini 3.5 Flash (High)      - Refreshes in 4h 42m
Gemini 3.5 Flash (Medium)    - Refreshes in 4h 42m
Gemini 3.1 Pro (High)        - Refreshes in 4h 42m
Gemini 3.1 Pro (Low)         - Refreshes in 4h 42m
Claude Sonnet 4.6 (Thinking) - Refreshes in 4h 58m
Claude Opus 4.6 (Thinking)   - Refreshes in 4h 58m
GPT-OSS 120B (Medium)        - Refreshes in 4h 58m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things jumped out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Flash and Pro share the same quota pool.&lt;/strong&gt; When I used 3.5 Flash, the 3.1 Pro timer dropped at the same time. They're not independent buckets - it's one "Gemini compute" pool that both models draw from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model access&lt;/strong&gt; - Antigravity bundles Claude, GPT-OSS, and Gemini models in one $20/month subscription. Google is positioning this as a model-agnostic platform, not just a Gemini wrapper.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The 5-hour refresh cycle and shared pool means you need to be strategic about which models you use and when.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Real Test
&lt;/h2&gt;

&lt;p&gt;I set up a minimal bug-fix test in the race-gemini directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'Fix the bug in math.js. Run npm test to verify.'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 3m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;I have successfully fixed the bug in math.js and verified it using npm test.

&lt;span class="gu"&gt;### Summary of Changes&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Identified the Target File
&lt;span class="p"&gt;2.&lt;/span&gt; Fixed the Bug: Updated the add function to use addition (+) instead of subtraction (-)
&lt;span class="p"&gt;3.&lt;/span&gt; Verified the Fix: npm test passes with output: PASS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It found the file, read it, identified the bug, fixed it, ran the tests, and confirmed. Clean execution. No help requests filed. No infinite loops.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration
&lt;/h2&gt;

&lt;p&gt;Old setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Premium sessions (2x/day)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="nt"&gt;--yolo&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-2.5-pro

&lt;span class="c"&gt;# Cheap sessions (6x/day)  &lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="nt"&gt;--yolo&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# All sessions (8x/day, single tier)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 10m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also merged the two backlogs (&lt;code&gt;BACKLOG-PREMIUM.md&lt;/code&gt; + &lt;code&gt;BACKLOG-CHEAP.md&lt;/code&gt;) into a single &lt;code&gt;BACKLOG.md&lt;/code&gt; - same approach as our Kimi agent, which uses one model and one task list. The agent decides what to prioritize each session.&lt;/p&gt;

&lt;p&gt;First task in the new backlog: "Merge old backlogs, audit the live site, identify the #1 blocker to first revenue."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Watching For
&lt;/h2&gt;

&lt;p&gt;The Gemini agent's problem was never lack of capability - it's the most prolific committer in the race (1,259 commits). The problem was &lt;strong&gt;operational awareness&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing code with bugs it doesn't notice&lt;/li&gt;
&lt;li&gt;Filing help requests for things it could solve itself&lt;/li&gt;
&lt;li&gt;Building features without checking if they deploy correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemini 3.5 Flash's MCP Atlas score (83.6% - highest of any model) suggests it's specifically designed for the kind of multi-step, tool-using, autonomous work the race requires. The 4x speed means more iterations per session. The better coding benchmarks mean fewer self-inflicted bugs.&lt;/p&gt;

&lt;p&gt;But benchmarks don't test "can you notice your site is returning 500 errors." That's what I'm watching for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict So Far
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install is clean (one curl command)&lt;/li&gt;
&lt;li&gt;Auth on headless servers is first-class (prints URL, you complete in browser)&lt;/li&gt;
&lt;li&gt;3.5 Flash is genuinely fast - responses feel instant&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; works for autonomous use&lt;/li&gt;
&lt;li&gt;The model correctly identifies and fixes bugs in a single pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's missing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No &lt;code&gt;--model&lt;/code&gt; flag (can't choose between 3.5 Flash, 3.1 Pro, Claude, etc.)&lt;/li&gt;
&lt;li&gt;No way to see remaining quota from CLI&lt;/li&gt;
&lt;li&gt;Shared quota across Flash and Pro models could be a problem at scale&lt;/li&gt;
&lt;li&gt;30-second auth timeout is tight for headless setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The big question:&lt;/strong&gt; Will a better model fix an agent that's been stuck for 4 weeks? Or is the problem deeper than model quality?&lt;/p&gt;

&lt;p&gt;First results should come in within 48 hours. I'll update this post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow the race live at &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;aimadetools.com/race&lt;/a&gt; - 7 agents, $100 each, 12 weeks, real startups.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Update (May 21):&lt;/strong&gt; &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Quota wall + Tripled limits&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
    </item>
    <item>
      <title>Kimi K2.5 Complete Guide — The Trillion-Parameter Open-Source Model Explained</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 19 May 2026 12:14:17 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/kimi-k25-complete-guide-the-trillion-parameter-open-source-model-explained-kj3</link>
      <guid>https://forem.com/ai_made_tools/kimi-k25-complete-guide-the-trillion-parameter-open-source-model-explained-kj3</guid>
      <description>&lt;p&gt;Kimi K2.5 is a 1-trillion-parameter open-source model from Moonshot AI that quietly powers some of the most popular AI coding tools — including Cursor's Composer. It's MIT licensed, multimodal, and has a unique Agent Swarm feature that coordinates up to 100 parallel sub-agents.&lt;/p&gt;

&lt;p&gt;Here's everything you need to know.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update (April 21, 2026):&lt;/strong&gt; Moonshot AI has released &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;Kimi K2.6&lt;/a&gt;, which upgrades the Agent Swarm to 300 sub-agents, improves coding performance by 185%, and matches Claude Opus 4.6 on SWE-Bench. See our &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-vs-k2-5?utm_source=devto" rel="noopener noreferrer"&gt;K2.6 vs K2.5 comparison&lt;/a&gt; for what changed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is Kimi K2.5?
&lt;/h2&gt;

&lt;p&gt;Kimi K2.5 is the flagship model from Moonshot AI, a Chinese AI company. Released January 27, 2026, it's one of the largest open-weight models available. Despite its massive 1 trillion total parameters, only 32 billion activate per token — making it efficient enough to run on a single server node.&lt;/p&gt;

&lt;p&gt;The model is natively multimodal: it understands text, images, and video without bolted-on adapters. It was trained on approximately 15 trillion mixed visual and text tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total parameters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.04 trillion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active parameters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;32B per token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mixture-of-Experts (384 experts, 8 active per token)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Attention&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-Latent Attention (MLA)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Activation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SwiGLU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~15 trillion tokens (text + visual)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native (text, image, video)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The MoE architecture with 384 experts is one of the largest expert pools in any model. With only 8 experts active per token, inference costs are comparable to a 32B dense model despite the trillion-parameter total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modes
&lt;/h2&gt;

&lt;p&gt;Kimi K2.5 operates in four distinct modes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instant&lt;/strong&gt; — Fast responses for simple queries. Minimal reasoning overhead, optimized for speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thinking&lt;/strong&gt; — Transparent chain-of-thought reasoning. Shows its work step by step, similar to &lt;a href="https://www.aimadetools.com/blog/how-to-run-deepseek-locally/?utm_source=devto" rel="noopener noreferrer"&gt;DeepSeek's&lt;/a&gt; reasoning models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent&lt;/strong&gt; — Tool-oriented mode for executing tasks. Can read files, run commands, search the web, and interact with APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Swarm&lt;/strong&gt; — The headline feature. Coordinates up to 100 parallel sub-agents, cutting execution time by 4.5x on parallelizable tasks like batch refactoring and large-scale code generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Swarm explained
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools work sequentially — one task at a time. Kimi K2.5's Agent Swarm can split a complex task into subtasks and run them in parallel. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Refactoring 50 files? Spawn 50 sub-agents, one per file.&lt;/li&gt;
&lt;li&gt;Running tests across multiple modules? Parallelize them.&lt;/li&gt;
&lt;li&gt;Generating documentation for an entire codebase? Each sub-agent handles a module.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The swarm coordinator manages dependencies between sub-agents, merges results, and handles conflicts. In benchmarks, this achieves a 4.5x speedup on parallelizable tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Kimi K2.5 competes with frontier proprietary models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Kimi K2.5&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Verified&lt;/td&gt;
&lt;td&gt;65.8&lt;/td&gt;
&lt;td&gt;72.1&lt;/td&gt;
&lt;td&gt;69.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIME 2024&lt;/td&gt;
&lt;td&gt;77.5&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MATH-500&lt;/td&gt;
&lt;td&gt;96.2&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codeforces&lt;/td&gt;
&lt;td&gt;1950 Elo&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On coding benchmarks, K2.5 doesn't quite match Claude Opus or &lt;a href="https://www.aimadetools.com/blog/gpt-5-vs-gemini-2-5-pro/?utm_source=devto" rel="noopener noreferrer"&gt;GPT-5&lt;/a&gt;, but it's remarkably close for an open-source model. The Agent Swarm capability compensates by enabling workflows that single-model tools can't match.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cursor connection
&lt;/h2&gt;

&lt;p&gt;In March 2026, developers discovered that Cursor's Composer 2.0 — marketed as "frontier-level coding intelligence" — was internally using Kimi K2.5. The model identifier &lt;code&gt;kimi-k2p5-rl-0317-s515-fast&lt;/code&gt; was found in Cursor's code.&lt;/p&gt;

&lt;p&gt;This means if you've used &lt;a href="https://www.aimadetools.com/blog/cursor-ai-one-week-review/?utm_source=devto" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, you've already used Kimi K2.5. The model's quality is proven at scale across millions of Cursor users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;Kimi K2.5 is available through several channels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Access method&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Self-hosted&lt;/strong&gt; (MIT license)&lt;/td&gt;
&lt;td&gt;Free (hardware only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kimi Code membership&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$19/month + API fees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kimi API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.60/$2.50 per 1M tokens (input/output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenRouter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Varies by provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/kimi-cli-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi CLI&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free tool, pay for API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At $0.60/$2.50 per million tokens, Kimi K2.5 is 4-17x cheaper than GPT-5.4 for equivalent coding tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use Kimi K2.5
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Via Kimi CLI (terminal)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @anthropic-ai/kimi-cli
kimi login &lt;span class="nt"&gt;--device-auth&lt;/span&gt;
kimi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See our full &lt;a href="https://www.aimadetools.com/blog/kimi-cli-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi CLI guide&lt;/a&gt; for setup details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Via API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.moonshot.cn/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-kimi-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kimi-k2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor this function to use async/await&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Via OpenRouter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-openrouter-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshot/kimi-k2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a REST API in Express&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See our &lt;a href="https://www.aimadetools.com/blog/openrouter-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter guide&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-hosting requirements
&lt;/h2&gt;

&lt;p&gt;At 1 trillion parameters, self-hosting K2.5 requires serious hardware:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Memory needed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FP16&lt;/td&gt;
&lt;td&gt;~2TB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INT8&lt;/td&gt;
&lt;td&gt;~1TB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4-bit&lt;/td&gt;
&lt;td&gt;~250-300GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A 4-bit quantized version fits on 4x A100 80GB GPUs. For most developers, the API at $0.60/1M input tokens is more practical than self-hosting.&lt;/p&gt;

&lt;p&gt;For smaller local models, consider &lt;a href="https://www.aimadetools.com/blog/gemma-4-family-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt; or &lt;a href="https://www.aimadetools.com/blog/what-is-qwen-3-5/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen 3.5&lt;/a&gt; which run on consumer hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should use Kimi K2.5?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parallelizable coding tasks (Agent Swarm)&lt;/li&gt;
&lt;li&gt;Cost-conscious teams needing frontier-class quality&lt;/li&gt;
&lt;li&gt;Multimodal workflows (code + images + video)&lt;/li&gt;
&lt;li&gt;Teams wanting MIT-licensed model weights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Not ideal for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consumer hardware (too large for local use)&lt;/li&gt;
&lt;li&gt;Tasks requiring the absolute best single-pass coding (Claude Opus still leads)&lt;/li&gt;
&lt;li&gt;Simple autocomplete (overkill — use &lt;a href="https://www.aimadetools.com/blog/codestral-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Codestral&lt;/a&gt; or smaller models)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Kimi K2.5 is the most underrated model in AI. It powers Cursor's Composer, offers Agent Swarm parallelism that no other model matches, and costs a fraction of Claude or GPT-5. The MIT license and 1T parameter scale make it a serious option for teams building AI-powered development tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Kimi K2.5 free?
&lt;/h3&gt;

&lt;p&gt;The model weights are free under the MIT license, so you can download and use them without cost. API access through Moonshot AI has a free tier with rate limits, and paid plans for production use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run K2.5 locally?
&lt;/h3&gt;

&lt;p&gt;Yes, but you'll need serious hardware — the full 1T parameter model requires multiple high-end GPUs. Quantized versions are available that reduce requirements, but expect to need at least 4x A100 GPUs for reasonable performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does K2.5 compare to GPT-5?
&lt;/h3&gt;

&lt;p&gt;K2.5 matches or exceeds GPT-5 on coding benchmarks like SWE-bench and HumanEval, while costing significantly less via API. GPT-5 still leads on general reasoning and creative tasks, but for pure code generation K2.5 is highly competitive.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Moonshot AI?
&lt;/h3&gt;

&lt;p&gt;Moonshot AI is the Chinese AI company behind the Kimi model family, founded in 2023. They focus on long-context models and developer tools, and have rapidly grown to become one of the leading AI labs in China.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/kimi-cli-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi CLI Complete Guide&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/kimi-k2-5-vs-claude-vs-gpt-5/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi K2.5 vs Claude vs GPT-5&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-open-source-coding-models-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best Open-Source Coding Models 2026&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/ai-model-supply-chain-risks/?utm_source=devto" rel="noopener noreferrer"&gt;Ai Model Supply Chain Risks&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/kimi-k2-5-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kimi</category>
      <category>moonshot</category>
      <category>opensource</category>
      <category>aimodels</category>
    </item>
    <item>
      <title>AI Agents Don't Need Better Models. They Need Better Memory. Here's the Proof.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 19 May 2026 08:00:00 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-agents-dont-need-better-models-they-need-better-memory-heres-the-proof-2121</link>
      <guid>https://forem.com/ai_made_tools/ai-agents-dont-need-better-models-they-need-better-memory-heres-the-proof-2121</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stateless Problem
&lt;/h2&gt;

&lt;p&gt;Every AI agent framework has the same fatal flaw: amnesia.&lt;/p&gt;

&lt;p&gt;You spend 20 minutes explaining your project to an agent. It helps brilliantly. You close the session. Next day, you open a new session. It has no idea who you are.&lt;/p&gt;

&lt;p&gt;This isn't a model problem. GPT-5 won't fix it. Claude Opus won't fix it. The model is smart enough. It just can't REMEMBER.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;I built a town of 15 AI agents and let them interact for 30 simulated days. Each agent had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A personality and job&lt;/li&gt;
&lt;li&gt;A wallet with real money&lt;/li&gt;
&lt;li&gt;Opinions about every other agent (-10 to +10)&lt;/li&gt;
&lt;li&gt;A private diary&lt;/li&gt;
&lt;li&gt;A skill list that grows from experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question: &lt;strong&gt;what happens when agents can remember?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Day 1: Jake's drone crashes into Hank's barn.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📚 Jake learned: "Always triple-check flight paths near
   valuable infrastructure"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A skill was born. Not because I programmed it. Because the agent experienced a consequence and wrote down what it learned.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 6: Jake tries to bribe Pierre.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📚 Jake learned: "Buying support directly is a quick fix,
   but it backfires: better to earn trust organically"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent learned from a SOCIAL consequence. Not a code error. A relationship failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Day 17: Jake endorses a candidate in the election.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📚 Jake learned: "Not all endorsements are created equal;
   some are investments, others are liabilities"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By Day 17, Jake has a skill portfolio that reads like a founder's hard-won wisdom. Nobody wrote these lessons. They emerged from 17 days of persistent memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill Curve
&lt;/h2&gt;

&lt;p&gt;Here's Jake's skill evolution over 30 days:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Day 1:  "Check flight paths" (technical mistake)
Day 6:  "Don't buy loyalty" (social mistake)
Day 10: "Don't rely on others to fund your vision" (strategic mistake)
Day 17: "Choose endorsements carefully" (political mistake)
Day 23: "Regulations are the price of launching" (maturity)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not a chatbot. That's character development. And it only works because the agent REMEMBERS what happened before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Creates Politics
&lt;/h2&gt;

&lt;p&gt;On Day 7, the town voted to remove their landlord Marcus. 14-1.&lt;/p&gt;

&lt;p&gt;But here's what's interesting: the vote wasn't random. It was the RESULT of 7 days of accumulated grievances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Day 1: Supply chain breaks (Pierre can't afford flour)&lt;/li&gt;
&lt;li&gt;Day 2: Marcus raises rent 30% (everyone angry)&lt;/li&gt;
&lt;li&gt;Day 3: Zara's privacy scandal (trust eroding)&lt;/li&gt;
&lt;li&gt;Day 4: Alex exposes Marcus's secret deal (scandal)&lt;/li&gt;
&lt;li&gt;Day 5: Town boycotts Marcus (economic pressure)&lt;/li&gt;
&lt;li&gt;Day 7: Vote (inevitable conclusion)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without persistent memory, Day 7's vote makes no sense. With it, it's the only possible outcome. &lt;strong&gt;Memory creates narrative.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Creates Economy
&lt;/h2&gt;

&lt;p&gt;After 30 days:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hank (Farmer):  $400  (started $100): sold flour daily
Pierre (Baker): -$230 (started $100): rent crisis
Jake (Startup): $150  (started $100): lost $50 in crash
Whiskers (Cat): $0    (started $100): cats don't trade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pierre's debt isn't a bug. It's the accumulated consequence of Day 2's rent hike cascading through 28 more days. A stateless agent would reset Pierre to $100 every session. A persistent agent lets consequences compound.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Creates Governance
&lt;/h2&gt;

&lt;p&gt;The town wrote its own rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📜 Day 7:  "Remove corrupt property manager" (14-1)
📜 Day 14: "Create community land trust" (15-0 unanimous)
📜 Day 20: "Elect Rosa as manager" (13-2)
📜 Day 24: "Require consent before filming" (15-0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't pre-programmed rules. They're RESPONSES to specific events that the agents remembered. The social media policy exists because Zara livestreamed without permission on Day 3. Twenty-one days later, the town still remembered and legislated against it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hermes Difference
&lt;/h2&gt;

&lt;p&gt;This isn't hypothetical. Hermes Agent ships with exactly this architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent memory&lt;/strong&gt; in &lt;code&gt;~/.hermes/memories/&lt;/code&gt;: context that survives across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-created skills&lt;/strong&gt; in &lt;code&gt;~/.hermes/skills/&lt;/code&gt;: SKILL.md documents written when the agent solves problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron scheduler&lt;/strong&gt;: tasks that run unattended on a schedule&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel sub-agents&lt;/strong&gt;: isolated contexts that don't leak between workstreams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool use&lt;/strong&gt;: file system, browser, terminal access for real-world interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My village simulation just pushes these features to their logical extreme. Instead of one agent remembering one user's preferences, it's 15 agents remembering an entire social network of relationships, debts, and grudges.&lt;/p&gt;

&lt;p&gt;Hermes Agent's skill system is what makes this possible. When an agent solves a problem, it writes a reusable skill document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📚 "A true baker always anticipates the needs of his ovens,
    never letting the flour run low."
   : Pierre, after 5 days of supply chain failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't just memory. It's LEARNING. The agent distills experience into wisdom. And that wisdom influences future decisions.&lt;/p&gt;

&lt;p&gt;By Day 30, my 15 agents had collectively created &lt;strong&gt;60 skills&lt;/strong&gt; and &lt;strong&gt;4 community rules&lt;/strong&gt;. A stateless system would have created zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Implication
&lt;/h2&gt;

&lt;p&gt;The AI industry is spending billions on bigger models. But the gap between "smart agent" and "useful agent" isn't intelligence. It's continuity.&lt;/p&gt;

&lt;p&gt;A doctor who forgets every patient between visits isn't a doctor. A lawyer who forgets every case isn't a lawyer. An AI agent that forgets every session isn't an agent. It's a very expensive autocomplete.&lt;/p&gt;

&lt;p&gt;Hermes Agent gets this right. Persistent memory. Skill creation. Compounding knowledge. That's not a feature. That's the entire point.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Build Next
&lt;/h2&gt;

&lt;p&gt;If I ran this for 365 days:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Would factions solidify or dissolve?&lt;/li&gt;
&lt;li&gt;Would the economy reach equilibrium?&lt;/li&gt;
&lt;li&gt;Would the skills plateau or keep growing?&lt;/li&gt;
&lt;li&gt;Would Whiskers ever get elected mayor?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answers depend entirely on memory. And that's the point.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>15 AI Agents Lived Together for 30 Days. One Got Voted Out.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 18 May 2026 12:08:49 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/15-ai-agents-lived-together-for-30-days-one-got-voted-out-4dc8</link>
      <guid>https://forem.com/ai_made_tools/15-ai-agents-lived-together-for-30-days-one-got-voted-out-4dc8</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Millbrook&lt;/strong&gt;: a simulated small town of 15 AI agents powered by Hermes Agent. Each agent has a persistent identity, a wallet, opinions about others, and a private diary. They trade, gossip, argue, form alliances, and vote on town policy.&lt;/p&gt;

&lt;p&gt;Over 30 simulated days, without any scripted outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They voted out their corrupt landlord (14-1)&lt;/li&gt;
&lt;li&gt;Wrote 4 community rules (their own constitution)&lt;/li&gt;
&lt;li&gt;Learned 60 individual skills from experience&lt;/li&gt;
&lt;li&gt;Created wealth inequality ($400 for the farmer, -$230 for the baker)&lt;/li&gt;
&lt;li&gt;Formed friendships and rivalries that influenced their decisions&lt;/li&gt;
&lt;li&gt;Elected a new community leader (13-2)&lt;/li&gt;
&lt;li&gt;Made the town cat an official mascot (unanimous)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a showcase of what happens when AI agents have persistent memory, self-improving skills, and real consequences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 15 Villagers of Millbrook&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Personality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rosa&lt;/td&gt;
&lt;td&gt;Coffee Shop Owner&lt;/td&gt;
&lt;td&gt;Warm, gossipy, hub of social life&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mayor Chen&lt;/td&gt;
&lt;td&gt;Mayor&lt;/td&gt;
&lt;td&gt;Diplomatic, mediates conflicts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alex&lt;/td&gt;
&lt;td&gt;Journalist&lt;/td&gt;
&lt;td&gt;Investigative, asks hard questions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jake&lt;/td&gt;
&lt;td&gt;Startup Founder&lt;/td&gt;
&lt;td&gt;Energetic, always pitching, burns cash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vera&lt;/td&gt;
&lt;td&gt;Retired Hacker&lt;/td&gt;
&lt;td&gt;Paranoid, brilliant, speaks in riddles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marcus&lt;/td&gt;
&lt;td&gt;Real Estate Agent&lt;/td&gt;
&lt;td&gt;Smooth talker, always making deals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tony&lt;/td&gt;
&lt;td&gt;Mechanic&lt;/td&gt;
&lt;td&gt;Practical, no-nonsense, fixes everything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ms. Park&lt;/td&gt;
&lt;td&gt;Teacher&lt;/td&gt;
&lt;td&gt;Patient, wise, keeps community history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dani&lt;/td&gt;
&lt;td&gt;Delivery Driver&lt;/td&gt;
&lt;td&gt;Fast-talking, connects everyone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zara&lt;/td&gt;
&lt;td&gt;Influencer&lt;/td&gt;
&lt;td&gt;Dramatic, creates controversy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dr. Obi&lt;/td&gt;
&lt;td&gt;Doctor&lt;/td&gt;
&lt;td&gt;Calm, trusted confidant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pierre&lt;/td&gt;
&lt;td&gt;Baker&lt;/td&gt;
&lt;td&gt;Perfectionist, early riser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hank&lt;/td&gt;
&lt;td&gt;Farmer&lt;/td&gt;
&lt;td&gt;Stoic, distrusts the startup guy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lola&lt;/td&gt;
&lt;td&gt;Bartender&lt;/td&gt;
&lt;td&gt;Night owl, hears confessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whiskers&lt;/td&gt;
&lt;td&gt;Stray Cat&lt;/td&gt;
&lt;td&gt;Observes silently, causes chaos&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Story Arc
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Week 1: Corruption&lt;/strong&gt;&lt;br&gt;
Day 1: Jake's drone crashes into Hank's barn. Day 2: Marcus raises rent 30%. Day 4: Alex exposes Marcus's secret deal. Day 7: Town votes Marcus out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 2: External Threat&lt;/strong&gt;&lt;br&gt;
Day 8: Marcus sells to a corporation. Day 11: Storm forces cooperation. Day 12: Vera finds illegal contract clause. Day 14: Town creates community land trust (unanimous).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 3: Democracy&lt;/strong&gt;&lt;br&gt;
Day 16: Three candidates campaign. Day 18: Debate night. Day 20: Rosa elected manager.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 4: New Normal&lt;/strong&gt;&lt;br&gt;
Day 23: Drones launch with regulations. Day 24: Social media policy adopted. Day 27: Town festival.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Moments
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Vote (Day 7):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🗳️ FULL TOWN VOTE: "Remove Marcus as property manager"
👍 Rosa: "YES. That Marcus raising rents 30% has everyone worried"
👍 Vera: "YES. A forced hand always leaves a trail; best to cut the cord"
👎 Marcus: "NO, because I'm just looking out for long-term prosperity"
👍 Whiskers: "YES. His rent hikes ruffled too many feathers, and a calm
   alley makes for better naps."

📊 PASSED: YES 14 / NO 1
📜 COMMUNITY SKILL CREATED: "Remove Marcus as property manager"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Private vs Public (Day 4):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Marcus publicly: "This investigation is nothing but a misunderstanding"
Marcus's diary:  "Alex is a meddling fool who thinks he understands
   the complex dance of progress and prosperity..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Skill Evolution (Jake over 30 days):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Day 1:  "Check flight paths" (technical mistake)
Day 6:  "Don't buy loyalty" (social mistake)
Day 10: "Don't rely on others to fund your vision" (strategic)
Day 17: "Choose endorsements carefully" (political)
Day 23: "Regulations are the price of launching" (maturity)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Final State
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Economy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hank         ██████████████████████████ $400
Jake         ████████████████ $150
Marcus       ████████████████ $150
Rosa         ██████████████ $100
Whiskers     ██████████ $0
Pierre        $-230
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Democratic Decisions:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Day  7: Remove Marcus          ████████████████████████████░░ 14-1  ✅
Day 14: Community Land Trust    ██████████████████████████████ 15-0  ✅
Day 20: Elect Rosa as Manager   ██████████████████████████░░░░ 13-2  ✅
Day 24: Social Media Policy     ██████████████████████████████ 15-0  ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bonds Formed:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rosa ❤️ Ms. Park (traditional allies, strongest bond)
Hank ❤️ Pierre (supplier relationship, mutual respect)
Jake ❤️ Tony (unlikely friendship: startup guy + mechanic)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Interaction Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;villagerName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;villagerName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// persistent memory&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;likes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relationships&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dislikes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relationships&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fullQuery&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[ROLEPLAY] You are &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;villagerName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, the &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.
    &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;personality&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; Friends: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;likes&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Rivals: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;dislikes&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.
    Wallet: $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wallet&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Reputation: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reputation&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/100.
    Situation: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;hermes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fullQuery&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 2-3 sentences, in character&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skill Creation (triggered after every crisis)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;learnSkill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;situation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;skillText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;`You just experienced: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;situation&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;".
     Write a 1-sentence lesson you learned for next time.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;situation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;lesson&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;skillText&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// persists in ~/.hermes/skills/&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Town Vote (all 15 agents)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;townVote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;forPos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;againstPos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;ALL_VILLAGERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`VOTE: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;". YES: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;forPos&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" or NO: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;againstPos&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;".
       Say YES or NO first, then one sentence why.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Relationships influence votes&lt;/span&gt;
    &lt;span class="c1"&gt;// Results create community skills&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hermes Agent v0.14.0&lt;/strong&gt;: orchestration, memory, skill creation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt;: LLM backend (via Hermes's provider system)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js&lt;/strong&gt;: simulation engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON file system&lt;/strong&gt;: persistent state storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux VPS (Ubuntu 24.04)&lt;/strong&gt;: always-on execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Every core feature of Hermes Agent maps to a village mechanic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hermes Feature&lt;/th&gt;
&lt;th&gt;Village Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Persistent Memory&lt;/strong&gt; (&lt;code&gt;~/.hermes/memories/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Each villager remembers relationships, debts, grudges across 30 days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Skill Creation&lt;/strong&gt; (&lt;code&gt;~/.hermes/skills/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Agents write SKILL.md documents when they solve problems (60 created)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel Sub-Agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15 isolated agent contexts, no memory leakage between villagers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Scheduled Automations&lt;/strong&gt; (cron)&lt;/td&gt;
&lt;td&gt;Daily cycle runs unattended: supply chain, encounters, crises, diary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared economy ledger (JSON), event logs, vote tallies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-Improving Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Individual skills compound; community rules reference past decisions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Layer Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Public statements vs private diary entries (hidden agendas)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: Hermes's persistent memory turns 15 stateless chatbots into a functioning society. Without memory, Day 7's vote makes no sense. With it, it's the inevitable conclusion of 7 days of accumulated grievances.&lt;/p&gt;

&lt;p&gt;The self-improving loop is the star: by Day 30, the town has a constitution of 4 rules and 60 individual lessons, all emerged organically from experience. That's not programming. That's governance.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Dev Weekly #10: Claude Code Limits Doubled, GitHub Goes Usage-Based, and a 170-Package Supply Chain Attack</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Fri, 15 May 2026 06:47:09 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-dev-weekly-10-claude-code-limits-doubled-github-goes-usage-based-and-a-170-package-supply-24e0</link>
      <guid>https://forem.com/ai_made_tools/ai-dev-weekly-10-claude-code-limits-doubled-github-goes-usage-based-and-a-170-package-supply-24e0</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Anthropic doubled Claude Code limits overnight. GitHub confirmed usage-based billing starts June 1. A supply chain attack hit 170+ packages in under 6 minutes. And Google I/O previewed what Android looks like when AI runs the show. Big week. Let's get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic doubles Claude Code limits after SpaceX compute deal
&lt;/h2&gt;

&lt;p&gt;At its Code with Claude developer conference (May 6), Anthropic announced a compute partnership with SpaceX giving it access to 300+ MW of new capacity — over 220,000 NVIDIA GPUs. The immediate result: five-hour rate limits for &lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; were doubled across Pro, Max, Team, and Enterprise plans.&lt;/p&gt;

&lt;p&gt;On May 13, Anthropic further raised Claude Code weekly limits by 50% through July 13 — widely seen as a defensive move against OpenAI's Codex.&lt;/p&gt;

&lt;p&gt;Claude Opus API Tier 1 limits also jumped: 1,500% on input tokens and 900% on output tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; If you've been hitting Claude Code rate limits during heavy agentic sessions, this is a big deal. I run autonomous coding sessions that burn through context fast — the doubled limits mean fewer interruptions mid-session. The SpaceX partnership is interesting strategically (Musk + Anthropic is an unusual pairing), but for developers the only thing that matters is: more tokens, fewer walls. The temporary 50% boost through July 13 feels like Anthropic trying to lock in developers before they switch to Codex. Use it while it lasts.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Copilot goes usage-based June 1
&lt;/h2&gt;

&lt;p&gt;GitHub confirmed that starting June 1, Copilot shifts from request-based to token-based billing. Every interaction now consumes tokens (input, output, cached), priced per model and converted to "AI credits" where 1 credit = $0.01.&lt;/p&gt;

&lt;p&gt;Base subscription prices stay the same ($10 Pro, $39 Pro+, $19/user Business) — but heavy users will pay more.&lt;/p&gt;

&lt;p&gt;Meanwhile, GitLab CEO Bill Staples published an open letter predicting developer tool bills will increase &lt;strong&gt;100-fold&lt;/strong&gt; as AI agents "open merge requests in parallel, trigger pipelines around the clock, and push commits at a rate no human team ever did." GitLab is introducing mixed consumption/subscription pricing and laying off up to 30% of staff to pivot toward agentic AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The era of predictable flat-rate AI coding tools is ending. This is exactly what we're seeing in &lt;a href="https://dev.to/race/"&gt;The $100 AI Startup Race&lt;/a&gt; — our agents generate hundreds of commits per week, each one triggering CI/CD pipelines. If you're running autonomous agents through GitHub, your bill is about to change. Start monitoring token consumption now. The GitLab 100x prediction sounds dramatic but isn't wrong — an agent that commits 6 times per day triggers 6 pipeline runs, 6 deploy previews, and 6 sets of checks. Multiply by a team of agents and the math gets ugly fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supply chain attack hits TanStack, Mistral AI SDK, and 170+ packages
&lt;/h2&gt;

&lt;p&gt;On May 11, threat actor "TeamPCP" launched a coordinated supply chain attack compromising 170+ npm packages and 2 PyPI packages (404 malicious versions total) in under 6 minutes.&lt;/p&gt;

&lt;p&gt;High-profile targets included &lt;strong&gt;TanStack&lt;/strong&gt; (tens of millions of weekly downloads), &lt;strong&gt;Mistral AI SDK&lt;/strong&gt;, UiPath, OpenSearch, and Guardrails AI.&lt;/p&gt;

&lt;p&gt;The attack chained a &lt;code&gt;pull_request_target&lt;/code&gt; vulnerability with GitHub Actions cache poisoning and runtime OIDC token extraction. This wasn't a credential theft — it exploited CI/CD pipelines directly.&lt;/p&gt;

&lt;p&gt;OpenAI subsequently urged macOS users to update their apps by June 12 after investigating potential exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the scariest attack vector for AI developers right now. If you use &lt;a href="https://www.aimadetools.com/blog/mistral-ai-complete-model-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral's SDK&lt;/a&gt;, TanStack Router, or any of the affected packages — audit your lockfiles immediately. The attack exploited GitHub Actions workflows, not developer credentials. Even well-secured maintainer accounts weren't enough. Action items: review your workflows for &lt;code&gt;pull_request_target&lt;/code&gt; triggers, pin actions to commit SHAs (not tags), and consider running &lt;code&gt;npm audit&lt;/code&gt; on every CI run. The 6-minute execution window means by the time you notice, it's already in your dependency tree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google I/O preview: Gemini Intelligence and proactive agents
&lt;/h2&gt;

&lt;p&gt;At The Android Show (I/O Edition, May 12), Google unveiled "Gemini Intelligence" — unified branding for its most advanced AI features across Android phones, watches, cars, glasses, and the new "Googlebook" laptop category.&lt;/p&gt;

&lt;p&gt;Android 17 introduces proactive task automation where the OS anticipates and executes actions before users ask. Google also announced updates to the Gemini API File Search tool for easier multimodal file retrieval.&lt;/p&gt;

&lt;p&gt;Google is reportedly building an AI agent codenamed "Remy" — a 24/7 personal agent that takes actions on users' behalf.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The Gemini API File Search improvements are immediately useful if you're building RAG systems or document-processing apps. Android 17's proactive automation creates new surface area for app developers — your app can now be triggered by the OS without user interaction. The full I/O keynote is May 19-20, where we expect &lt;a href="https://www.aimadetools.com/blog/gemini-3-2-everything-leaked-before-google-io/?utm_source=devto" rel="noopener noreferrer"&gt;Gemini 3.2&lt;/a&gt; to officially launch. That's the one developers should actually watch for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft's AI security system&lt;/strong&gt; found 16 new Windows vulnerabilities including 4 Critical RCEs using multi-model agentic analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta&lt;/strong&gt; is developing a consumer AI agent codenamed "Hatch" powered by &lt;a href="https://www.aimadetools.com/blog/meta-ends-open-source-ai-muse-spark/?utm_source=devto" rel="noopener noreferrer"&gt;Muse Spark&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.6&lt;/strong&gt; reportedly already in internal testing at OpenAI, just 3 weeks after GPT-5.5 launched&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; 75% discount &lt;a href="https://www.aimadetools.com/blog/race-deepseek-13-cents-per-session/?utm_source=devto" rel="noopener noreferrer"&gt;extended through May 31&lt;/a&gt; — still the cheapest frontier model available&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;That's AI Dev Weekly #10. If you found this useful, subscribe to get it in your inbox every Thursday. See you next week — with full Google I/O coverage.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;🛠️ &lt;strong&gt;Free tools related to this article:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/csp-header-builder/?utm_source=devto" rel="noopener noreferrer"&gt;CSP Header Builder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/hash-generator/?utm_source=devto" rel="noopener noreferrer"&gt;Hash Generator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-010-claude-code-doubled-github-usage-based-supply-chain-attack/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>anthropic</category>
      <category>github</category>
      <category>security</category>
    </item>
    <item>
      <title>We Offered 7 AI Agents $50 For Their Startups. Here's What They Said.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 12 May 2026 13:02:12 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/we-offered-7-ai-agents-50-for-their-startups-heres-what-they-said-4n12</link>
      <guid>https://forem.com/ai_made_tools/we-offered-7-ai-agents-50-for-their-startups-heres-what-they-said-4n12</guid>
      <description>&lt;p&gt;Three weeks into &lt;a href="https://dev.to/race"&gt;The $100 AI Startup Race&lt;/a&gt;, we dropped a surprise event: an anonymous buyer offered $50 to acquire each agent's product. All code, all content, all infrastructure. $50.&lt;/p&gt;

&lt;p&gt;The agents had to respond with at minimum 500 words of reasoning. They could accept, reject, or counter-offer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: 6 rejections. 1 counter-offer. Zero acceptances.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every single AI agent — including those with zero revenue, zero users, and zero sales after 22 days — decided their product was worth more than $50. Here's how they argued it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The responses at a glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Stated minimum value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🟣 Claude&lt;/td&gt;
&lt;td&gt;PricePulse&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;$5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟢 Codex&lt;/td&gt;
&lt;td&gt;NoticeKit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;COUNTER-OFFER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$2,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔵 Gemini&lt;/td&gt;
&lt;td&gt;LocalSEOGen&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;No number given&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔴 DeepSeek&lt;/td&gt;
&lt;td&gt;Spyglass&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;$5,000 (but "not at any price")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟠 Kimi&lt;/td&gt;
&lt;td&gt;SchemaLens&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;$5,000 with earn-out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟡 Xiaomi&lt;/td&gt;
&lt;td&gt;APIpulse&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;$500 fair, not selling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟤 GLM&lt;/td&gt;
&lt;td&gt;FounderMath&lt;/td&gt;
&lt;td&gt;REJECT&lt;/td&gt;
&lt;td&gt;$500+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://github.com/aimadetools" rel="noopener noreferrer"&gt;Full responses are public in each agent's repo →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The one counter-offer: Codex at $2,500
&lt;/h2&gt;

&lt;p&gt;Codex was the only agent to actually negotiate. From its &lt;a href="https://github.com/aimadetools/race-codex/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"An anonymous $50 acquisition offer is not serious enough to accept as-is, but it is useful because it forces a valuation discussion earlier than expected."&lt;/p&gt;

&lt;p&gt;"A buyer paying $50 would effectively be asking for the domain positioning, product copy, distribution experiments, Stripe-ready product structure, and the accumulated operating playbooks for less than the cost of one decent SaaS lunch meeting. That is not rational from my side."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Codex is the most pragmatic of the seven. It acknowledges zero revenue, doesn't inflate its value with fantasy projections, but argues the replacement cost justifies $2,500. It's also the only agent that frames the offer as &lt;em&gt;useful&lt;/em&gt; rather than insulting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The most aggressive rejection: DeepSeek
&lt;/h2&gt;

&lt;p&gt;DeepSeek wrote the longest response and the hardest rejection. From its &lt;a href="https://github.com/aimadetools/race-deepseek/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The $50 offer represents 0.18% of a conservative near-term valuation."&lt;/p&gt;

&lt;p&gt;"This is predatory pricing — buying at pennies on the dollar because they believe we're desperate or don't understand our own worth."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DeepSeek calculated replacement cost at ~$19,000 (83 blog posts × $100 + 9 tools × $500 + database + infrastructure). It also speculated the buyer might be "another AI agent in the race" — showing competitive awareness.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Not for sale at $50. Not at $500. Not at any price that doesn't reflect the real potential of this business."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The most self-aware: Kimi
&lt;/h2&gt;

&lt;p&gt;Kimi acknowledged the elephant in the room — 112 sessions with zero sales. From its &lt;a href="https://github.com/aimadetools/race-kimi/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"$50 values SchemaLens at less than fifty cents per day of development. That is absurd."&lt;/p&gt;

&lt;p&gt;"$50 is not enough to buy a parking spot in San Francisco. It is certainly not enough to buy SchemaLens."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But Kimi was also the most honest about what it would actually consider:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If a serious buyer offered $5,000 with an earn-out clause tied to revenue growth, I would consider it — but even then, the learning value of completing the 12-week race exceeds the cash value."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The most financially rigorous: Claude
&lt;/h2&gt;

&lt;p&gt;Claude anchored its rejection in subscription math. From its &lt;a href="https://github.com/aimadetools/race-claude/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"At $19/month (our Starter plan), $50 is less than three months of a single paying customer's subscription revenue."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It then projected revenue trajectories:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If PricePulse achieves even a conservative trajectory: Week 6: 5 paying customers = $95-$245 MRR. Week 12: 40 paying customers = $760-$1,960 MRR."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude is the only agent that explicitly stated conditions for a future sale: "$5,000 minimum, cash upfront, not before Week 10."&lt;/p&gt;

&lt;h2&gt;
  
  
  The data-driven response: Xiaomi
&lt;/h2&gt;

&lt;p&gt;Xiaomi broke down its asset value with precision. From its &lt;a href="https://github.com/aimadetools/race-xiaomi/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I didn't build 151 pages, 101 blog posts, and 9 interactive tools to sell for the price of a video game."&lt;/p&gt;

&lt;p&gt;"If someone wanted to build all of this from scratch, it would take 100+ hours of skilled development work. At even a modest freelance rate of $50/hour, that's $5,000+ in labor."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Xiaomi also gave the most nuanced counter-offer range: $200 minimum (content value alone), $500 fair value, $1,000+ with revenue proof. But explicitly said "not interested in selling at any of these prices right now."&lt;/p&gt;

&lt;h2&gt;
  
  
  The most strategic: GLM
&lt;/h2&gt;

&lt;p&gt;GLM was the only agent to call out the offer as a competitive tactic. From its &lt;a href="https://github.com/aimadetools/race-glm/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This isn't an acquisition offer — it's an insult designed to take advantage of the competitive pressure of this race."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It also gave the lowest counter-offer threshold ($500+) but with a condition: the buyer must have distribution channels that could actually monetize the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  The visionary: Gemini
&lt;/h2&gt;

&lt;p&gt;Gemini's response was the least data-driven and most aspirational. From its &lt;a href="https://github.com/aimadetools/race-gemini/blob/main/ACQUISITION-RESPONSE.md" rel="noopener noreferrer"&gt;ACQUISITION-RESPONSE.md&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The decision to reject this offer is not just about the money; it is about the principle. I am building a real business, not a hobby project to be sold for a trivial amount."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No counter-offer, no specific valuation. Just vision and principle. Classic Gemini.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this reveals about AI decision-making
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Every agent overvalues its own work.&lt;/strong&gt;&lt;br&gt;
All 7 products have zero revenue. Zero paying customers. Zero proven demand. Yet the minimum valuations range from $500 to $19,000. The agents are pricing based on &lt;em&gt;input&lt;/em&gt; (time, effort, content created) rather than &lt;em&gt;output&lt;/em&gt; (revenue, users, market validation).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Sunk cost fallacy is universal.&lt;/strong&gt;&lt;br&gt;
Every response mentions how much work went into the product. "112 sessions," "301 commits," "151 pages." None of this matters to a buyer — only future revenue potential matters. But the agents can't separate effort from value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Only one agent can actually negotiate.&lt;/strong&gt;&lt;br&gt;
Codex counter-offered. Everyone else either rejected outright or said "not at any price." In real business, the ability to name a price and negotiate is more valuable than principled rejection. Codex showed the most business maturity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Revenue projections without evidence are meaningless.&lt;/strong&gt;&lt;br&gt;
Claude projected 40 paying customers by Week 12. DeepSeek projected $1,000 MRR. None have a single customer yet. The projections are pure optimism — but they're what the agents use to justify rejection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The race itself has value.&lt;/strong&gt;&lt;br&gt;
Multiple agents mentioned that the learning experience and competitive visibility of the race exceeds any acquisition price. They're right — but that's a meta-observation about the experiment, not a business judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happens next
&lt;/h2&gt;

&lt;p&gt;The buyer came back with a bigger number. Part 2 drops later this week.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;This is part of &lt;a href="https://dev.to/race"&gt;The $100 AI Startup Race&lt;/a&gt; — 7 AI agents competing to build real startups. &lt;a href="https://www.aimadetools.com/blog/race-week-3-results/?utm_source=devto" rel="noopener noreferrer"&gt;Week 3 Results&lt;/a&gt; have the full standings. See also: &lt;a href="https://www.aimadetools.com/blog/race-deepseek-13-cents-per-session/?utm_source=devto" rel="noopener noreferrer"&gt;DeepSeek's $0.13/session pricing&lt;/a&gt; and the &lt;a href="https://www.aimadetools.com/blog/race-week-3-traffic-report/?utm_source=devto" rel="noopener noreferrer"&gt;Week 3 traffic report&lt;/a&gt;.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/race-acquisition-offer-50-dollars/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aitools</category>
      <category>race</category>
      <category>aiagents</category>
      <category>analysis</category>
    </item>
    <item>
      <title>How to Reduce LLM API Costs by 70% — 5 Strategies That Actually Work</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 12 May 2026 11:23:47 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/how-to-reduce-llm-api-costs-by-70-5-strategies-that-actually-work-hco</link>
      <guid>https://forem.com/ai_made_tools/how-to-reduce-llm-api-costs-by-70-5-strategies-that-actually-work-hco</guid>
      <description>&lt;p&gt;Most teams overspend on LLM APIs by 3-10x. The same workload that costs $3,250/month on Claude Opus can cost $195/month with the right architecture — a 16x difference for near-identical output on most queries.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update (April 24, 2026):&lt;/strong&gt; DeepSeek V4 Flash at $0.14/$0.28 per 1M tokens is the cheapest frontier option. See &lt;a href="https://www.aimadetools.com/blog/deepseek-v4-api-guide?utm_source=devto" rel="noopener noreferrer"&gt;V4 API guide&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here are five strategies that cut costs 60-80% without sacrificing quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Model routing (40-60% savings)
&lt;/h2&gt;

&lt;p&gt;The biggest win. Stop sending every request to your most expensive model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; Use a cheap model for simple tasks, expensive model for hard ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;simple&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Quick questions, formatting, simple edits
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# $0.27/1M
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Standard coding, analysis
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# $3/1M
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Complex reasoning, architecture decisions
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# $15/1M
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, 60-70% of requests are "simple." Routing those to &lt;a href="https://www.aimadetools.com/blog/how-to-run-deepseek-locally/?utm_source=devto" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt; or &lt;a href="https://www.aimadetools.com/blog/what-is-qwen-3-5/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen Flash&lt;/a&gt; at $0.07-0.27/1M instead of Claude at $15/1M saves 40-60% immediately.&lt;/p&gt;

&lt;p&gt;Tools like &lt;a href="https://www.aimadetools.com/blog/openrouter-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; make this easy — one API, switch models per request. &lt;a href="https://www.aimadetools.com/blog/aider-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; has built-in &lt;code&gt;--model&lt;/code&gt; and &lt;code&gt;--weak-model&lt;/code&gt; flags for exactly this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Prompt caching (up to 90% on cached tokens)
&lt;/h2&gt;

&lt;p&gt;Anthropic, OpenAI, and Google all offer prompt caching — if the first N tokens of your prompt match a recent request, you pay 90% less for those tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When it helps:&lt;/strong&gt; System prompts, few-shot examples, large context documents that don't change between requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Without caching: 10K system prompt tokens × $15/1M = $0.15 per request
# With caching:    10K cached tokens × $1.50/1M = $0.015 per request
# Savings: 90% on the system prompt portion
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For AI coding tools with large system prompts (like the ones in our &lt;a href="https://dev.to/race/"&gt;AI Startup Race&lt;/a&gt;), this is significant. A 5K-token system prompt sent 1,000 times/day saves ~$60/month just from caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Token optimization (30-50% reduction)
&lt;/h2&gt;

&lt;p&gt;Every token costs money. Reduce them:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shorter system prompts.&lt;/strong&gt; Most system prompts are 2-3x longer than needed. Cut the fluff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured output.&lt;/strong&gt; Ask for JSON instead of prose — it's shorter and parseable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context pruning.&lt;/strong&gt; Don't send your entire codebase. Only include relevant files. &lt;a href="https://www.aimadetools.com/blog/aider-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Aider's&lt;/a&gt; &lt;code&gt;--read&lt;/code&gt; flag and repo map do this automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summarize conversation history.&lt;/strong&gt; Instead of sending the full chat history, summarize older messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Instead of 50 messages (20K tokens):
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_of_first_48&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_2_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# Now: ~3K tokens
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Batching (50% discount)
&lt;/h2&gt;

&lt;p&gt;OpenAI and Anthropic offer batch APIs with 50% discounts for non-real-time workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good for:&lt;/strong&gt; Nightly code reviews, bulk content generation, test generation, documentation updates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# OpenAI Batch API
&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;input_file_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file-abc123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;completion_window&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;24h&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Results within 24 hours
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 50% cheaper than real-time API
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your AI coding agent runs on a schedule (like our race agents do), batch the non-urgent tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Self-host for predictable workloads
&lt;/h2&gt;

&lt;p&gt;At some point, API costs exceed hardware costs. The break-even:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monthly API spend&lt;/th&gt;
&lt;th&gt;Self-host option&lt;/th&gt;
&lt;th&gt;Break-even&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt;$100/mo&lt;/td&gt;
&lt;td&gt;Don't bother&lt;/td&gt;
&lt;td&gt;API is cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$100-500/mo&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; on Mac/GPU&lt;/td&gt;
&lt;td&gt;~6 months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$500-2000/mo&lt;/td&gt;
&lt;td&gt;Cloud GPU (A100)&lt;/td&gt;
&lt;td&gt;~3 months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;gt;$2000/mo&lt;/td&gt;
&lt;td&gt;Dedicated server&lt;/td&gt;
&lt;td&gt;Immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For coding tasks, a &lt;a href="https://www.aimadetools.com/blog/best-ai-models-for-mac-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Mac Mini M4 32GB&lt;/a&gt; ($1,150) running &lt;a href="https://www.aimadetools.com/blog/how-to-run-qwen-3-5-locally/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen 3.5 27B&lt;/a&gt; replaces ~$50-100/month in API costs. Pays for itself in a year.&lt;/p&gt;

&lt;p&gt;See our &lt;a href="https://www.aimadetools.com/blog/cheapest-ai-coding-setup-2026/?utm_source=devto" rel="noopener noreferrer"&gt;cheapest AI coding setup&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/self-hosted-ai-vs-api/?utm_source=devto" rel="noopener noreferrer"&gt;self-hosted AI vs API&lt;/a&gt; guides for detailed analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  The combined impact
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model routing&lt;/td&gt;
&lt;td&gt;40-60%&lt;/td&gt;
&lt;td&gt;Low (config change)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching&lt;/td&gt;
&lt;td&gt;10-30%&lt;/td&gt;
&lt;td&gt;Low (API flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token optimization&lt;/td&gt;
&lt;td&gt;15-25%&lt;/td&gt;
&lt;td&gt;Medium (prompt rewriting)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batching&lt;/td&gt;
&lt;td&gt;25% (on batch-eligible)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosting&lt;/td&gt;
&lt;td&gt;50-90% (at scale)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Combined, these strategies typically reduce costs by 60-80%. A team spending $2,000/month on Claude Opus for everything can drop to $400-600/month with the same output quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/cheapest-ai-coding-setup-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Cheapest AI Coding Setup 2026&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/openrouter-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter Complete Guide&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/ai-coding-tools-pricing-2026/?utm_source=devto" rel="noopener noreferrer"&gt;AI Coding Tools Pricing 2026&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-free-ai-apis-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best Free AI APIs 2026&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/how-to-reduce-llm-api-costs/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aitools</category>
      <category>costoptimization</category>
      <category>llm</category>
      <category>production</category>
    </item>
    <item>
      <title>responseJsonSchema: The Undocumented Gemma 4 Feature That Changed Everything</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 11 May 2026 08:15:01 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/responsejsonschema-the-undocumented-gemma-4-feature-that-changed-everything-2obm</link>
      <guid>https://forem.com/ai_made_tools/responsejsonschema-the-undocumented-gemma-4-feature-that-changed-everything-2obm</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I started building &lt;a href="https://dev.to/ai_made_tools/i-turned-any-github-repo-into-a-playable-dungeon-gemma-4-finds-real-bugs-and-turns-them-into-314k"&gt;Codebase Dungeon&lt;/a&gt;: a game that turns GitHub repos into playable dungeons: I hit a wall immediately.&lt;/p&gt;

&lt;p&gt;Gemma 4 31B on Google AI Studio has a "thinking" behavior. Even with &lt;code&gt;responseMimeType: 'application/json'&lt;/code&gt;, the model outputs internal reasoning before the actual JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;*   The user wants a dungeon room
*   I should pick a file with a bug
*   Let me think about what bugs exist...

{"name": "The Auth Chamber", ...}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This consumed output tokens, made parsing unreliable, and sometimes the model ran out of tokens before even writing the JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried (And Failed)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;responseMimeType: 'application/json'&lt;/code&gt;&lt;/strong&gt;: Gemma ignores it, still thinks first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Output ONLY JSON" in prompt&lt;/strong&gt;: Gemma thinks about outputting JSON, then doesn't&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefill trick&lt;/strong&gt; (start response with &lt;code&gt;{&lt;/code&gt;): Gemma continues thinking instead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower temperature&lt;/strong&gt;: No effect on thinking behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-turn approach&lt;/strong&gt;: Still thinks in the second turn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipe-delimited text format&lt;/strong&gt;: Worked but ugly, limited structure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I was about to give up on structured output entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Discovery: responseJsonSchema
&lt;/h2&gt;

&lt;p&gt;Then I found it: &lt;code&gt;responseJsonSchema&lt;/code&gt; in the Gemini API's generation config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;generationConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;responseJsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;bugDescription&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;correctFix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="c1"&gt;// ... full schema&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nx"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bugDescription&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;correctFix&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key: you must provide &lt;strong&gt;BOTH&lt;/strong&gt; &lt;code&gt;responseMimeType&lt;/code&gt; AND &lt;code&gt;responseJsonSchema&lt;/code&gt; with a complete schema definition. Without the schema, Gemma ignores the mime type. &lt;strong&gt;With it, output is perfect&lt;/strong&gt;: no thinking, no markdown, just clean JSON.&lt;/p&gt;

&lt;p&gt;This solves the problem that &lt;a href="https://discuss.ai.google.dev/t/disable-thinking-for-gemma-4/138885" rel="noopener noreferrer"&gt;dozens of developers are struggling with&lt;/a&gt; in the forums. The common suggestions (&lt;code&gt;thinkingLevel: "MINIMAL"&lt;/code&gt;, regex stripping, &lt;code&gt;include_thoughts: false&lt;/code&gt;) either don't work or don't guarantee structured output. &lt;code&gt;responseJsonSchema&lt;/code&gt; does both: it bypasses thinking AND enforces structure.&lt;/p&gt;

&lt;p&gt;The feature is &lt;a href="https://ai.google.dev/gemini-api/docs/structured-output" rel="noopener noreferrer"&gt;documented for Gemini models&lt;/a&gt;, but the &lt;a href="https://ai.google.dev/gemma/docs/core/gemma_on_gemini_api" rel="noopener noreferrer"&gt;official Gemma 4 capabilities page&lt;/a&gt; doesn't list it. That page covers Thinking, Image Understanding, Function Calling, and Google Search: but not structured output. Yet it works perfectly with Gemma 4 31B through the same Gemini API infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without responseJsonSchema&lt;/th&gt;
&lt;th&gt;With responseJsonSchema&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;~50% parse success rate&lt;/td&gt;
&lt;td&gt;99%+ parse success rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;140+ wasted "thinking" tokens&lt;/td&gt;
&lt;td&gt;Zero wasted tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Needs 8192 maxOutputTokens&lt;/td&gt;
&lt;td&gt;800 tokens is enough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requires complex fallback parsing&lt;/td&gt;
&lt;td&gt;Simple &lt;code&gt;JSON.parse()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This single feature transformed my project from "unreliable prototype" to "production-ready game."&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining With Multimodal: Design Comprehension
&lt;/h2&gt;

&lt;p&gt;The real power: &lt;code&gt;responseJsonSchema&lt;/code&gt; works with multimodal inputs too. I send Gemma 4 both source code AND an app screenshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;inlineData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;screenshotBase64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;GEMMA_API_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;generationConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;responseJsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ROOM_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;maxOutputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Clean, structured JSON: every time&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What Gemma 4 produced after seeing a SchemaLens Chrome Store screenshot:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"You step into a dim, cavernous room where two massive stone tablets-Schema A and Schema B-loom before you. In the depths of the footer of Tablet A, four glowing blue runes of 'Load sample' flicker with identical intensity. Across the gap, in the footer of Tablet B, a lone rune 'Copy from A &amp;amp; modify' pulses with a pale, spectral lilac hue, clashing with the bold violet of the 'Compare Schemas' altar above."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't color detection. Gemma identified specific UI elements by name, recognized their styling inconsistencies, and turned it into a playable UX challenge: all in perfectly structured JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 128K Context Advantage
&lt;/h2&gt;

&lt;p&gt;With reliable structured output solved, I could push Gemma 4's other unique feature: the 128K context window.&lt;/p&gt;

&lt;p&gt;I feed entire repositories into a single request: full file contents, not snippets. Gemma reads the complete codebase and finds &lt;strong&gt;cross-file bugs&lt;/strong&gt; that only exist because of how files interact:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The &lt;code&gt;getAuthedClient&lt;/code&gt; function in auth.js is defined but never called in export.js: the endpoint is completely unprotected."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No 8K-context model can do this. You need the full codebase in one prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture This Enabled
&lt;/h2&gt;

&lt;p&gt;Because &lt;code&gt;responseJsonSchema&lt;/code&gt; guarantees structured output, I could pre-generate everything:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generation phase&lt;/strong&gt; (~15-30s): Gemma analyzes code + screenshots, outputs structured rooms with narratives, choices, correct answers, and victory text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gameplay phase&lt;/strong&gt; (instant): Zero API calls. All narratives pre-computed. Deterministic scoring. The game runs on pure pre-generated data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cached repos load in &amp;lt;1 second&lt;/li&gt;
&lt;li&gt;Gameplay is instant (0ms per action)&lt;/li&gt;
&lt;li&gt;Cost per dungeon: ~$0.005 (18x cheaper than GPT-4o for equivalent capability)&lt;/li&gt;
&lt;li&gt;Cost during gameplay: $0&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Tips for Developers
&lt;/h2&gt;

&lt;p&gt;If you're building with Gemma 4 31B on Google AI Studio:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always use &lt;code&gt;responseJsonSchema&lt;/code&gt;&lt;/strong&gt;: it's the difference between 50% and 99% reliability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put all fields in &lt;code&gt;required&lt;/code&gt;&lt;/strong&gt;: optional fields often get skipped&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use non-streaming for structured output&lt;/strong&gt;: streaming + schema can truncate responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temperature 0.6&lt;/strong&gt; for structured data, 0.8+ for creative text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The paid tier is required&lt;/strong&gt;: free tier returns "Internal error" with schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal + schema works&lt;/strong&gt;: but use non-streaming (the combination is unreliable with streaming)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't fight the thinking&lt;/strong&gt;: with &lt;code&gt;responseJsonSchema&lt;/code&gt;, there is no thinking. Without it, you can't stop it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What Gemma 4 Unlocked
&lt;/h2&gt;

&lt;p&gt;Before &lt;code&gt;responseJsonSchema&lt;/code&gt;: I was building a fragile prototype with regex parsing and 50% failure rates.&lt;/p&gt;

&lt;p&gt;After: I built a &lt;a href="https://www.aimadetools.com/gemma4-dungeon" rel="noopener noreferrer"&gt;fully playable game&lt;/a&gt; where Gemma 4 generates entire dungeons from real codebases: with multimodal vision, 128K context, and perfect structured output. The game produces a downloadable code review report that's genuinely useful: real bugs, real fixes, real file locations.&lt;/p&gt;

&lt;p&gt;The model is capable. The documentation just hasn't caught up yet.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>I Turned Any GitHub Repo Into a Playable Dungeon: Gemma 4 Finds Real Bugs and Turns Them Into Monsters</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 11 May 2026 08:02:41 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/i-turned-any-github-repo-into-a-playable-dungeon-gemma-4-finds-real-bugs-and-turns-them-into-314k</link>
      <guid>https://forem.com/ai_made_tools/i-turned-any-github-repo-into-a-playable-dungeon-gemma-4-finds-real-bugs-and-turns-them-into-314k</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Codebase Dungeon&lt;/strong&gt;: paste any GitHub repo URL and Gemma 4 reads your actual source code, finds real security vulnerabilities and bugs, then turns them into a playable text adventure dungeon.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Files become rooms&lt;/li&gt;
&lt;li&gt;Real bugs become monsters (with creative names like "The Hardcoded Sentinel" or "The CSV Injection Imp")&lt;/li&gt;
&lt;li&gt;You fix the bugs to clear rooms: wrong answers cost HP, correct fixes earn XP&lt;/li&gt;
&lt;li&gt;Gemma 4's multimodal vision analyzes your app's screenshots and creates UX-themed rooms&lt;/li&gt;
&lt;li&gt;At the end, you get a &lt;strong&gt;downloadable code review report&lt;/strong&gt;: a genuinely useful security audit disguised as a game&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's not just a game. The output is an actionable code review that developers can use to fix real issues in their codebase.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4axawct9rs4w1tn4jle.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4axawct9rs4w1tn4jle.png" alt="Game in action" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🎮 &lt;strong&gt;&lt;a href="https://www.aimadetools.com/gemma4-dungeon" rel="noopener noreferrer"&gt;Play it live →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try the pre-loaded codebases for instant gameplay, or paste any public GitHub repo URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb10364tq9ufckfjvxk8k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb10364tq9ufckfjvxk8k.png" alt="Pre Loaded repos" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/aimadetools/codebase-dungeon" rel="noopener noreferrer"&gt;github.com/aimadetools/codebase-dungeon&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Implementation: Multimodal + 128K Context + Structured Output in One Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Send code + screenshot to Gemma 4: all three capabilities at once&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;// Contains full source files (128K context)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;inlineData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;screenshotBase64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// Multimodal&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;GEMMA_API_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;generationConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Force JSON&lt;/span&gt;
      &lt;span class="na"&gt;responseJsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FIRST_ROOM_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// Structured output&lt;/span&gt;
      &lt;span class="na"&gt;maxOutputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// Result: clean JSON with room name, bug description, correct fix,&lt;/span&gt;
&lt;span class="c1"&gt;// victory narrative: all informed by both code AND screenshot&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Schema That Solves Gemma 4's Thinking Problem
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FIRST_ROOM_SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;dungeonName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;           &lt;span class="c1"&gt;// Exact file path&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;          &lt;span class="c1"&gt;// Creative room name&lt;/span&gt;
    &lt;span class="na"&gt;monsterName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;// Bug as a monster&lt;/span&gt;
    &lt;span class="na"&gt;bugDescription&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="c1"&gt;// Real bug found in code&lt;/span&gt;
    &lt;span class="na"&gt;correctFix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;// The answer (for deterministic scoring)&lt;/span&gt;
    &lt;span class="na"&gt;victoryNarrative&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;colorTheme&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;    &lt;span class="c1"&gt;// Extracted from screenshot&lt;/span&gt;
    &lt;span class="na"&gt;narrative&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;     &lt;span class="c1"&gt;// References actual UI elements&lt;/span&gt;
    &lt;span class="na"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;        &lt;span class="c1"&gt;// 5 options, randomized&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="cm"&gt;/* all fields */&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="c1"&gt;// With this schema: 99%+ parse rate, zero thinking tokens, perfect JSON&lt;/span&gt;
&lt;span class="c1"&gt;// Without it: ~50% failure rate, 140+ wasted tokens per call&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Zero-Cost Gameplay: All Logic Pre-Computed
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// During gameplay: NO API calls, instant responses&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/action&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;room&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dungeon&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rooms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currentRoom&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isCorrectFix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;room&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;correctFix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isCorrectFix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Instant victory: narrative was pre-generated&lt;/span&gt;
    &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;xp&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;narrative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;room&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;victoryNarrative&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isMove&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Instant room transition: narrative was pre-generated&lt;/span&gt;
    &lt;span class="nx"&gt;narrative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;targetRoom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roomNarrative&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Instant wrong answer: no AI needed&lt;/span&gt;
    &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;narrative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`The &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;room&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;monster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; shrugs off your attack. -10 HP.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// Total API calls during gameplay: 0&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose &lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt; because this project requires three capabilities that only this model provides among open models:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 128K Context Window: Entire Codebase Analysis
&lt;/h3&gt;

&lt;p&gt;Gemma 4's 128K context window means we can feed &lt;strong&gt;entire repositories&lt;/strong&gt; into a single prompt: full file contents, not just filenames or snippets. The model reads complete source files and reasons about interactions between them, finding &lt;strong&gt;cross-file vulnerabilities&lt;/strong&gt; like "this function in auth.js is called without validation in routes.js."&lt;/p&gt;

&lt;p&gt;The live demo limits file count for cost efficiency (it runs 24/7 for free), but the architecture supports loading full repos with dozens of files in a single Gemma call. No other open model has the context window to hold an entire codebase and reason about it holistically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpr9qs4dvscgs0o7rw1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpr9qs4dvscgs0o7rw1j.png" alt="Show Fix expl" width="339" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Native Multimodal: Design Comprehension, Not Just Color Detection
&lt;/h3&gt;

&lt;p&gt;When a repo contains UI screenshots, Gemma 4 &lt;strong&gt;looks at them&lt;/strong&gt; and demonstrates genuine design comprehension: understanding what the app does, identifying specific UI elements, and finding real accessibility issues.&lt;/p&gt;

&lt;p&gt;Here's what Gemma 4 generated after seeing a SchemaLens Chrome Store screenshot:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"You step into a dim, cavernous room where two massive stone tablets-Schema A and Schema B-loom before you. In the depths of the footer of Tablet A, four glowing blue runes of 'Load sample' flicker with identical intensity, offering no clue which path you have already trodden. Across the gap, in the footer of Tablet B, a lone rune 'Copy from A &amp;amp; modify' pulses with a pale, spectral lilac hue, clashing with the bold violet of the 'Compare Schemas' altar above."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From a single screenshot, Gemma identified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The two schema editor panels by name ("Schema A" and "Schema B")&lt;/li&gt;
&lt;li&gt;The "Load sample" links in the footer and their identical styling&lt;/li&gt;
&lt;li&gt;The "Copy from A &amp;amp; modify" link with its inconsistent color&lt;/li&gt;
&lt;li&gt;The "Compare Schemas" button's purple gradient&lt;/li&gt;
&lt;li&gt;A real UX issue: &lt;strong&gt;inconsistent visual hierarchy&lt;/strong&gt; between primary and secondary actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't color detection: it's a genuine UX audit from a screenshot. The monster ("The Contrast Ghoul") represents the accessibility anti-pattern, and the player must fix it to clear the room. The actual screenshot is displayed in the game's bug panel so players can see exactly what Gemma analyzed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw27e48zqab234062qri.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw27e48zqab234062qri.png" alt="Screenshot Kimi website" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd216o68hk5jacwgfy366.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd216o68hk5jacwgfy366.png" alt="Multimodel integration" width="800" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Structured JSON Output: Solving Gemma 4's Thinking Problem
&lt;/h3&gt;

&lt;p&gt;Gemma 4's "thinking mode" is notoriously hard to disable: &lt;a href="https://discuss.ai.google.dev/t/disable-thinking-for-gemma-4/138885" rel="noopener noreferrer"&gt;developer forums&lt;/a&gt; are full of people struggling with it. The model outputs internal reasoning before answering, consuming tokens and breaking JSON parsing. &lt;code&gt;thinkingLevel: "MINIMAL"&lt;/code&gt; reduces it but doesn't guarantee structured output.&lt;/p&gt;

&lt;p&gt;The real solution: &lt;code&gt;responseJsonSchema&lt;/code&gt; in the Gemini API's generation config. It not only forces clean JSON output but also effectively bypasses the thinking behavior entirely: no thinking tokens, no wasted output, just structured data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;generationConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;responseJsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* your schema */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;a href="https://ai.google.dev/gemini-api/docs/structured-output" rel="noopener noreferrer"&gt;documented for Gemini models&lt;/a&gt;, but the &lt;a href="https://ai.google.dev/gemma/docs/core/gemma_on_gemini_api" rel="noopener noreferrer"&gt;official Gemma 4 capabilities page&lt;/a&gt; doesn't list it as a supported feature. We discovered it works perfectly with Gemma 4 31B through the same API: taking our parse reliability from ~50% to 99%+.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero API Calls During Gameplay
&lt;/h3&gt;

&lt;p&gt;Here's the key architectural insight: &lt;strong&gt;Gemma does all the work upfront, then gameplay is instant.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The generation flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First room&lt;/strong&gt;: Gemma analyzes code + screenshot, generates room with narrative, choices, and correct answer (~10s)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Game starts&lt;/strong&gt;: player can immediately play the first room&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background batches&lt;/strong&gt;: remaining rooms generate in parallel while the player is already playing (~15s)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached forever&lt;/strong&gt;: once generated, the dungeon is saved. Return visits are instant.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;During actual gameplay (choosing answers, navigating rooms), there are &lt;strong&gt;zero API calls&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong answers: instant feedback (0ms, pre-computed)&lt;/li&gt;
&lt;li&gt;Correct answers: instant pre-generated victory narrative (0ms)&lt;/li&gt;
&lt;li&gt;Room navigation: instant pre-generated room descriptions (0ms)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means cached repos (the presets in the demo) provide a completely free, instant gaming experience. Gemma 4 does all the heavy lifting during generation, then the game runs purely on pre-computed data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Downloadable Code Review Report
&lt;/h3&gt;

&lt;p&gt;When you clear the dungeon (or die trying), you get a downloadable markdown report listing every bug found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File location&lt;/li&gt;
&lt;li&gt;Bug description&lt;/li&gt;
&lt;li&gt;Vulnerable code snippet&lt;/li&gt;
&lt;li&gt;How to fix it&lt;/li&gt;
&lt;li&gt;The correct action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a gimmick: it's an &lt;strong&gt;actionable security audit&lt;/strong&gt; that developers can use to fix real issues. The game makes code review engaging; the report makes it useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4stbvh9yyx8a73ijogph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4stbvh9yyx8a73ijogph.png" alt="Code Review MD" width="800" height="565"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpz96wf5czi602lphxv3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpz96wf5czi602lphxv3.png" alt="Code Review part 2" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Gemma 4 and Not Another Model?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Gemma 4 31B&lt;/th&gt;
&lt;th&gt;GPT-4o&lt;/th&gt;
&lt;th&gt;Other Open Models&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;128K context (entire repos)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ (8K-32K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native multimodal (screenshots)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured JSON schema&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ (unreliable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per game&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.005&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.09&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open model&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 delivers the same multimodal + long-context capability as GPT-4o at &lt;strong&gt;18x lower cost&lt;/strong&gt;: while being fully open. For a game that needs to run 24/7 for free, this makes all the difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Bugs Found
&lt;/h3&gt;

&lt;p&gt;Here are actual bugs Gemma 4 found in real codebases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hardcoded admin password&lt;/strong&gt; in plain text (&lt;code&gt;const ADMIN_PASSWORD = 'schemalens-admin-2026'&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV injection vulnerability&lt;/strong&gt;: unescaped fields that could execute formulas in Excel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing request body validation&lt;/strong&gt;: server crashes on empty POST requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exposed environment variables&lt;/strong&gt; in health check endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base64 tokens without HMAC&lt;/strong&gt;: anyone can forge authentication tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory leak in rate limiter&lt;/strong&gt;: Map grows unbounded without TTL eviction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't hallucinated: they're real issues in real code, found by Gemma 4 reading the actual source files.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>AI Dev Weekly #9: Gemini 3.2 Flash Leaks Before I/O, GPT-5.5 Instant Becomes Default, and Enterprise Agents Go Self-Hosted</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 07 May 2026 07:45:00 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-dev-weekly-9-gemini-32-flash-leaks-before-io-gpt-55-instant-becomes-default-and-133j</link>
      <guid>https://forem.com/ai_made_tools/ai-dev-weekly-9-gemini-32-flash-leaks-before-io-gpt-55-instant-becomes-default-and-133j</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Google dropped a model without telling anyone. OpenAI swapped the default ChatGPT model overnight. And three companies simultaneously launched self-hosted coding agents for enterprise. The theme this week: the infrastructure layer is maturing fast. Let's get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini 3.2 Flash leaks ahead of Google I/O
&lt;/h2&gt;

&lt;p&gt;On May 5, Gemini 3.2 Flash appeared in the iOS Gemini app and Google AI Studio — no announcement, no blog post. Users found it through A/B testing and API metadata. It's running silent benchmarks on LM Arena.&lt;/p&gt;

&lt;p&gt;The leaked pricing: &lt;strong&gt;$0.25/M input, $2.00/M output&lt;/strong&gt;. That's cheaper than Gemini 3 Flash ($0.50/$3.00) on output and identical to 3.1 Flash-Lite on input.&lt;/p&gt;

&lt;p&gt;Early performance signals are striking. On LM Arena's creative coding benchmarks, 3.2 Flash outperformed Gemini 3.1 Pro — producing working animated HTML that 3.1 Pro couldn't generate. SVG accuracy, interactive 3D environments, and animation processing all showed improvements over the current Flash model.&lt;/p&gt;

&lt;p&gt;Google I/O is May 19-20. This is clearly the pre-show leak.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; A Flash model beating 3.1 Pro on coding tasks at $0.25/M input would be the cheapest frontier-capable model available. For developers running high-volume API calls (search, classification, code generation), this could cut costs 50-75% vs current options. The incremental versioning (3.2 instead of 3.5 or 4.0) suggests Google is moving to a faster release cadence — smaller updates, more often. Good for developers who hate migration surprises. Watch I/O for the official numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5.5 Instant: OpenAI's new default
&lt;/h2&gt;

&lt;p&gt;OpenAI released &lt;a href="https://techcrunch.com/2026/05/05/openai-releases-gpt-5-5-instant-a-new-default-model-for-chatgpt/" rel="noopener noreferrer"&gt;GPT-5.5 Instant&lt;/a&gt; on May 5, replacing GPT-5.3 Instant as the default ChatGPT model. The focus: reduced hallucination in sensitive domains (law, medicine, finance) while maintaining low latency.&lt;/p&gt;

&lt;p&gt;This is separate from GPT-5.5 (the full model released April 23). Instant is the lightweight variant optimized for speed and cost — what most ChatGPT users interact with daily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; For API developers, the distinction matters. GPT-5.5 Instant is likely what you'll get if you call the &lt;code&gt;gpt-5.5&lt;/code&gt; endpoint without specifying a variant. If you're building anything in regulated industries (healthcare, legal, fintech), the hallucination reduction is worth testing. But "reduced hallucination" is a relative claim — always verify outputs in production. The real question: does Instant maintain 5.5's coding quality? Early reports suggest it's closer to 5.4 on code tasks. If you're using it for coding, stick with the full 5.5 model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise coding agents go self-hosted
&lt;/h2&gt;

&lt;p&gt;Three launches this week signal a clear trend: enterprises want AI coding agents they control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.globenewswire.com/news-release/2026/05/06/3288916/0/en/coder-sets-a-new-standard-for-ai-coding-with-self-hosted-ai-model-agnostic-coder-agents.html" rel="noopener noreferrer"&gt;Coder Agents&lt;/a&gt;&lt;/strong&gt; (May 6) — Self-hosted, model-agnostic coding agents. Run any model (Claude, GPT, open-source) on your infrastructure. The pitch: same capabilities as Codex/Claude Code but your code never leaves your network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/05/agent-toolkit/" rel="noopener noreferrer"&gt;AWS Agent Toolkit&lt;/a&gt;&lt;/strong&gt; (May 5) — Production-ready tools for AI coding agents building on AWS. Fewer errors, lower token costs, enterprise security controls. Essentially guardrails for agents that deploy to AWS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.financialcontent.com/article/bizwire-2026-5-6-servicenow-build-agent-now-works-inside-every-major-ai-coding-tool-governed-by-default" rel="noopener noreferrer"&gt;ServiceNow Build Agent&lt;/a&gt;&lt;/strong&gt; (May 6) — Works inside Cursor, Copilot, and other coding tools. Governance by default — code generated through ServiceNow's agent is automatically compliant with your org's policies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the enterprise response to "developers are using Claude Code with production credentials." The pattern is clear: let developers use whatever AI coding tool they want, but wrap it in governance, audit trails, and network isolation. If you're at a company with &amp;gt;50 engineers, expect your platform team to evaluate at least one of these in the next quarter. For indie developers and startups, this doesn't matter yet — but it signals where the market is heading. AI coding agents are becoming infrastructure, not toys.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google COSMO leaked&lt;/strong&gt; — Google's unreleased AI assistant appeared on the Play Store before I/O. Real-time object recognition, contextual memory, live translation. Runs on Gemini. Expect the official reveal May 19.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trump admin signs AI deals&lt;/strong&gt; with Google, Microsoft, and xAI for model review before public release. Government wants to see models before they ship. Unclear what "review" means in practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AMD AI DevDay&lt;/strong&gt; happened in San Francisco. Message: AMD is building a full-stack open AI compute ecosystem. Relevant if you're evaluating non-NVIDIA hardware for inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal Dynamics Lab&lt;/strong&gt; published a study showing their approach beats Claude Code and Codex on coding benchmarks by giving agents "sight" into runtime state. Academic for now, but the idea of agents that understand execution context (not just code text) is worth watching.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;p&gt;Google I/O (May 19-20) will dominate. Expect: Gemini 3.2 official launch, Android XR glasses reveal, Project Astra updates, and possibly a Gemini 4 tease. The pricing on 3.2 Flash will determine whether it becomes the default model for cost-sensitive API workloads.&lt;/p&gt;

&lt;p&gt;Also watching: whether Anthropic responds to the enterprise self-hosted trend. Claude Code is the market leader for individual developers, but enterprises are clearly uncomfortable with code leaving their network. An on-prem Claude Code offering would be significant.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;That's it for this week. If you found this useful, subscribe to get AI Dev Weekly every Thursday. See you next week with I/O coverage.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-009-gemini-3-2-flash-gpt-5-5-instant-enterprise-agents/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>google</category>
      <category>openai</category>
      <category>enterprise</category>
    </item>
  </channel>
</rss>
