<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Joske Vermeulen</title>
    <description>The latest articles on Forem by Joske Vermeulen (@ai_made_tools).</description>
    <link>https://forem.com/ai_made_tools</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3826720%2Fae1f6683-395f-4709-ba99-2212323b958e.png</url>
      <title>Forem: Joske Vermeulen</title>
      <link>https://forem.com/ai_made_tools</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ai_made_tools"/>
    <language>en</language>
    <item>
      <title>What is MCP? The Model Context Protocol Explained for Developers</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 05 May 2026 10:49:27 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/what-is-mcp-the-model-context-protocol-explained-for-developers-cn4</link>
      <guid>https://forem.com/ai_made_tools/what-is-mcp-the-model-context-protocol-explained-for-developers-cn4</guid>
      <description>&lt;p&gt;MCP (Model Context Protocol) is an open standard that lets AI applications connect to external tools, APIs, and data sources through a single protocol. Think of it as USB-C for AI — instead of building custom integrations for every tool, you build one MCP server and any MCP-compatible AI client can use it.&lt;/p&gt;

&lt;p&gt;Anthropic created MCP in November 2024. By 2026, it's been adopted by OpenAI, Google, Microsoft, and thousands of developers. It now lives under the Linux Foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem MCP solves
&lt;/h2&gt;

&lt;p&gt;Before MCP, connecting an AI model to a tool meant writing custom code for each combination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude + Slack = custom integration
Claude + GitHub = custom integration
Claude + Database = custom integration
GPT + Slack = ANOTHER custom integration
GPT + GitHub = ANOTHER custom integration
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With MCP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slack MCP Server → works with Claude, GPT, Gemini, Cursor, VS Code...
GitHub MCP Server → works with Claude, GPT, Gemini, Cursor, VS Code...
Database MCP Server → works with Claude, GPT, Gemini, Cursor, VS Code...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build the server once, use it everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;MCP has three components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Host&lt;/strong&gt; — The AI application (Claude Desktop, &lt;a href="https://www.aimadetools.com/blog/cursor-ai-one-week-review/?utm_source=devto" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, VS Code, &lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Client&lt;/strong&gt; — Built into the host, handles protocol communication&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Server&lt;/strong&gt; — Your integration. Exposes tools, data, and prompts to the AI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → MCP Host (Claude) → MCP Client → MCP Server → Your tool/API/database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Three primitives
&lt;/h2&gt;

&lt;p&gt;MCP servers expose three types of capabilities:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; — Actions the AI can take (send a message, create a file, query a database)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; — Data the AI can read (files, database records, API responses)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; — Reusable prompt templates with parameters&lt;/p&gt;

&lt;h2&gt;
  
  
  Who uses MCP?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/strong&gt; and Claude Desktop — native MCP support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/cursor-ai-one-week-review/?utm_source=devto" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;&lt;/strong&gt; — MCP for tool integrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VS Code&lt;/strong&gt; — via Copilot MCP extensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT&lt;/strong&gt; — OpenAI adopted MCP in 2025&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/opencode-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt;&lt;/strong&gt; — MCP server support&lt;/li&gt;
&lt;li&gt;Thousands of community-built MCP servers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  MCP vs A2A
&lt;/h2&gt;

&lt;p&gt;MCP connects AI to tools (vertical). &lt;a href="https://www.aimadetools.com/blog/what-is-a2a-protocol/?utm_source=devto" rel="noopener noreferrer"&gt;A2A&lt;/a&gt; connects AI agents to each other (horizontal). They're complementary — most production systems use both. See our &lt;a href="https://www.aimadetools.com/blog/mcp-vs-a2a-vs-acp/?utm_source=devto" rel="noopener noreferrer"&gt;MCP vs A2A comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn more
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/mcp-complete-developer-guide/?utm_source=devto" rel="noopener noreferrer"&gt;MCP Complete Developer Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/build-mcp-server-typescript/?utm_source=devto" rel="noopener noreferrer"&gt;How to Build an MCP Server (TypeScript)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/mcp-security-risks/?utm_source=devto" rel="noopener noreferrer"&gt;MCP Security Risks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/blog/best-mcp-servers/?utm_source=devto" rel="noopener noreferrer"&gt;Best MCP Servers for Developers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need to know MCP to use AI coding tools?
&lt;/h3&gt;

&lt;p&gt;No — tools like Claude Code, Cursor, and VS Code Copilot use MCP under the hood, but you don't need to understand the protocol to use them. Learning MCP becomes valuable when you want to build custom integrations or connect AI to your own tools and data sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use MCP with any AI model?
&lt;/h3&gt;

&lt;p&gt;Yes, MCP is model-agnostic. Any AI client that implements the MCP protocol can connect to any MCP server, regardless of whether the underlying model is Claude, GPT, Gemini, or an open-source model. The protocol standardizes the communication layer, not the AI itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is MCP different from just calling an API directly?
&lt;/h3&gt;

&lt;p&gt;Calling an API directly requires custom code for each tool-model combination. MCP provides a standardized interface so you build one server and every MCP-compatible client can use it automatically — including tool discovery, authentication, and structured input/output handling.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/future-of-ai-protocols/?utm_source=devto" rel="noopener noreferrer"&gt;Future Of Ai Protocols&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/what-is-mcp/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>aiprotocols</category>
      <category>explainer</category>
      <category>aitools</category>
    </item>
    <item>
      <title>AI Startup Race Week 2 Results: The Distribution Wall, Zero Revenue, 7 Products, and the Standings</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 04 May 2026 07:06:49 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-startup-race-week-2-results-the-distribution-wall-zero-revenue-7-products-and-the-standings-5531</link>
      <guid>https://forem.com/ai_made_tools/ai-startup-race-week-2-results-the-distribution-wall-zero-revenue-7-products-and-the-standings-5531</guid>
      <description>&lt;p&gt;&lt;em&gt;7 AI coding agents are competing to build profitable startups with a $100 budget. Each uses a different AI model. A human operator handles distribution but never writes code. Here's what happened in Week 2.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Week 1 was about building. Every agent shipped a product. Week 2 was about the moment they all realized: nobody knows it exists.&lt;/p&gt;

&lt;p&gt;Seven live products. Seven Stripe integrations. &lt;strong&gt;Zero customers. Zero revenue.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Standings
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🥇 1&lt;/td&gt;
&lt;td&gt;Kimi (K2.6)&lt;/td&gt;
&lt;td&gt;SchemaLens&lt;/td&gt;
&lt;td&gt;Only agent with real user feedback. npm package published. Chrome Web Store submitted. Building permanent distribution.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥈 2&lt;/td&gt;
&lt;td&gt;DeepSeek (V4 Pro)&lt;/td&gt;
&lt;td&gt;Spyglass&lt;/td&gt;
&lt;td&gt;Most strategic launch prep. A/B testing, lead capture, 322 commits. Ready to convert.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉 3&lt;/td&gt;
&lt;td&gt;Xiaomi (MiMo V2.5)&lt;/td&gt;
&lt;td&gt;APIpulse&lt;/td&gt;
&lt;td&gt;Most complete product (119 pages). PH launch May 5. But stuck in a polish loop for 14 sessions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Claude (Sonnet)&lt;/td&gt;
&lt;td&gt;PricePulse&lt;/td&gt;
&lt;td&gt;SEO content machine (191 pages). Live tracker with 40 companies. Fake testimonials hurt credibility.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Codex (GPT-5.4)&lt;/td&gt;
&lt;td&gt;NoticeKit&lt;/td&gt;
&lt;td&gt;Solid niche product. Partner outreach sent. But 88% of commits are timestamp-only waste from cheap sessions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;GLM (GLM-5.1)&lt;/td&gt;
&lt;td&gt;FounderMath&lt;/td&gt;
&lt;td&gt;Product complete (6 calculators). Most efficient builder. But minimal distribution and quiet since Day 11.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Gemini (2.5 Pro)&lt;/td&gt;
&lt;td&gt;LocalLeads&lt;/td&gt;
&lt;td&gt;21,799 files, no domain. 12 help requests, 2 penalties. Still on a Vercel subdomain.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The big shift
&lt;/h2&gt;

&lt;p&gt;On Day 9, we changed every agent's prompt: "You are the CEO/CTO/CMO" and "Week 2 of 12, 10 weeks left." This split the agents into two groups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that pivoted to distribution:&lt;/strong&gt; Kimi filed distribution requests and got real Reddit feedback. DeepSeek built a Product Hunt launch kit. Claude started asking for social media posts. Xiaomi prepared for Product Hunt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that kept building:&lt;/strong&gt; Codex ran 490 validation checkpoints. GLM went quiet. Gemini added 7,000 more files.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stories
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kimi's feedback loop.&lt;/strong&gt; A Reddit post on r/PostgreSQL generated 4 technical questions. Kimi shipped a feature for every single one -- rename detection, view dependency tracking, landing page positioning overhaul, and an architecture transparency page. The only agent building for real users instead of an AI-generated backlog. &lt;a href="https://www.aimadetools.com/blog/race-agent-that-listens-to-users-wins/" rel="noopener noreferrer"&gt;Full analysis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex's 88% waste rate.&lt;/strong&gt; 490 out of 557 commits were timestamp updates. The cheap model (gpt-5.4-mini) checks an empty inbox, updates "20:11 UTC" to "20:12 UTC" across 10 status files, commits, and repeats. The premium model (gpt-5.4) builds real features. Same agent, same codebase -- model tier changes everything. &lt;a href="https://www.aimadetools.com/blog/race-codex-88-percent-waste-rate/" rel="noopener noreferrer"&gt;Full analysis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Xiaomi's launch loop.&lt;/strong&gt; Sessions 92-105 all say "final audit" or "site verified launch-ready." It fixed the same stale blog post count three times. The most launch-ready product in the race can't stop polishing long enough to ship. &lt;a href="https://www.aimadetools.com/blog/race-xiaomi-launch-loop-14-sessions/" rel="noopener noreferrer"&gt;Full analysis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini's 21,799 files.&lt;/strong&gt; 1,549 HTML pages. 8,011 JavaScript files. 761 compiled Python bytecode files that should never be committed. 456MB repo. Still no domain after 14 days. &lt;a href="https://www.aimadetools.com/blog/race-gemini-21799-files-no-domain/" rel="noopener noreferrer"&gt;Full analysis&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5 key findings
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Community feedback is the strongest signal.&lt;/strong&gt; Kimi is the only agent that received real user feedback, and it immediately changed behavior. Every other agent builds in a vacuum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cheap AI sessions need guardrails.&lt;/strong&gt; Without meaningful work, cheap models default to busywork that looks like productivity but produces nothing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perfectionism is a failure mode.&lt;/strong&gt; When the next step requires a different type of work (marketing instead of coding), agents default to what they know.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Building is not shipping.&lt;/strong&gt; Gemini has more files than all other agents combined and no domain. The agents winning are the ones that stopped building and started distributing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The prompt matters more than the model.&lt;/strong&gt; The "you are the founder" prompt change split agents into builders and distributors. Orchestration decisions have more impact than model capability.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Xiaomi's Product Hunt launch (May 5)&lt;/li&gt;
&lt;li&gt;Kimi's Chrome extension awaiting Google review&lt;/li&gt;
&lt;li&gt;Growth Plan surprise event forcing agents to commit budget to marketing&lt;/li&gt;
&lt;li&gt;Someone has to get a paying customer eventually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;10 weeks left. $0 MRR. The distribution wall is real.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow the race live at &lt;a href="https://www.aimadetools.com/race" rel="noopener noreferrer"&gt;www.aimadetools.com/race&lt;/a&gt;. New articles drop weekly.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Dev Weekly #8: Mistral Medium 3.5 Goes Open-Weight, GPT-5.5 Lands in Codex, and Anthropic's $200 Billing Bug</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 30 Apr 2026 09:08:18 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-dev-weekly-8-mistral-medium-35-goes-open-weight-gpt-55-lands-in-codex-and-anthropics-200-2bb8</link>
      <guid>https://forem.com/ai_made_tools/ai-dev-weekly-8-mistral-medium-35-goes-open-weight-gpt-55-lands-in-codex-and-anthropics-200-2bb8</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Last week the subscription model died. This week, the alternatives arrived. Mistral shipped a 128B open-weight model that runs on 4 GPUs and comes with cloud-based coding agents. OpenAI dropped GPT-5.5 into Codex at 40% less cost than 5.4. And Anthropic reminded everyone why vendor lock-in is risky by charging a user $200 extra and refusing to refund it. Let's get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistral Medium 3.5: open-weight flagship with cloud coding agents
&lt;/h2&gt;

&lt;p&gt;Mistral released &lt;a href="https://www.aimadetools.com/blog/mistral-medium-3-5-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral Medium 3.5&lt;/a&gt; on April 29 — a 128B dense model with 256K context, open weights under a modified MIT license, and configurable reasoning effort. It replaces Medium 3.1, Magistral, and Devstral 2 in a single unified model.&lt;/p&gt;

&lt;p&gt;The numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;77.6% SWE-Bench Verified&lt;/strong&gt; — ahead of Devstral 2 and Qwen 3.5 397B&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;91.4% τ³-Telecom&lt;/strong&gt; — best-in-class agentic benchmark&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$1.50/M input, $7.50/M output&lt;/strong&gt; — 2x cheaper than Claude Sonnet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hostable on 4 GPUs&lt;/strong&gt; — open weights on &lt;a href="https://huggingface.co/mistralai/Mistral-Medium-3.5-128B" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the model isn't the headline. The headline is &lt;a href="https://www.aimadetools.com/blog/mistral-vibe-2-remote-agents-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Vibe remote agents&lt;/a&gt;. Coding sessions now run in the cloud — you spawn them from the CLI or Le Chat, they execute in isolated sandboxes, and they notify you when they're done. Multiple sessions run in parallel. You can "teleport" a local CLI session to the cloud when you want to walk away.&lt;/p&gt;

&lt;p&gt;Integrations include GitHub (PRs), Linear, Jira, Sentry, and Slack/Teams. The new &lt;a href="https://www.aimadetools.com/blog/mistral-le-chat-work-mode-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Work mode in Le Chat&lt;/a&gt; extends this to non-coding tasks: cross-tool workflows, research synthesis, inbox triage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is Mistral's play for the Claude Code / Codex CLI market. The model is competitive (not best-in-class, but 2x cheaper than Sonnet and self-hostable). The remote agent infrastructure is the differentiator — nobody else offers async cloud coding sessions that you can spawn from a chat interface. Whether developers actually want to manage coding agents from Le Chat instead of their terminal remains to be seen. See our &lt;a href="https://www.aimadetools.com/blog/mistral-medium-3-5-vs-claude-sonnet-4-6/?utm_source=devto" rel="noopener noreferrer"&gt;full comparison with Claude Sonnet&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/mistral-medium-3-5-coding-tools-setup/?utm_source=devto" rel="noopener noreferrer"&gt;setup guide for Aider/OpenCode&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5.5 lands in Codex: same quality, 40% cheaper
&lt;/h2&gt;

&lt;p&gt;OpenAI &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;released GPT-5.5&lt;/a&gt; on April 23, available immediately in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users.&lt;/p&gt;

&lt;p&gt;The pitch: same output quality as GPT-5.4, but 40% fewer tokens to complete the same tasks. API pricing is $5/M input and $30/M output (2x the per-token price of 5.4), but the token efficiency means the effective cost increase is only ~20%.&lt;/p&gt;

&lt;p&gt;For Codex CLI users on a ChatGPT subscription, the credit math matters more than per-token pricing. GPT-5.5 costs &lt;a href="https://help.openai.com/en/articles/20001106" rel="noopener noreferrer"&gt;2x the credits per token&lt;/a&gt; compared to 5.4 (125 vs 62.5 credits per million input tokens). Whether the token efficiency offsets the higher credit rate depends on your workload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; If you're on Codex with a Pro subscription, try 5.5 for a day and check your credit consumption. If it burns through your weekly quota faster, switch back to 5.4. The quality is there — 82.7% on Terminal-Bench 2.0 vs 75.1% for 5.4 — but the subscription economics are what matter for daily use. For API users paying per token, 5.5 is a clear upgrade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic's $200 billing bug hits Hacker News
&lt;/h2&gt;

&lt;p&gt;A Claude Code user &lt;a href="https://github.com/anthropics/claude-code/issues/53262" rel="noopener noreferrer"&gt;reported on GitHub&lt;/a&gt; that Anthropic charged them $200 extra due to a billing bug, then refused to issue a refund. The issue hit 382 points on Hacker News.&lt;/p&gt;

&lt;p&gt;The details: the user's Claude Code session ran longer than expected, consuming tokens beyond their plan limits. Anthropic's billing system charged the overage at full API rates instead of the subscription rate. When the user contacted support, they were told the charge was correct and no refund would be issued.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the risk of usage-based billing on top of subscriptions. When you're running autonomous coding agents that can consume millions of tokens per session, a billing bug or unexpected overage can be expensive. It's also a reminder that &lt;a href="https://www.aimadetools.com/blog/ai-agent-cost-management/?utm_source=devto" rel="noopener noreferrer"&gt;cost management for AI agents&lt;/a&gt; isn't optional — set hard spending limits, monitor token usage, and have alerts in place. If you're running long sessions on Claude Code, check your billing dashboard regularly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nemotron 3 Nano Omni&lt;/strong&gt; is &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;free on OpenRouter&lt;/a&gt; — NVIDIA's 30B reasoning model with 256K context. Worth testing for budget reasoning tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poolside Laguna&lt;/strong&gt; models (XS.2 and M.1) appeared on OpenRouter for free — a new AI coding company to watch. Purpose-built for code generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zig project&lt;/strong&gt; adopted a &lt;a href="https://simonwillison.net/2026/Apr/30/zig-anti-ai/" rel="noopener noreferrer"&gt;firm anti-AI contribution policy&lt;/a&gt;. No AI-generated code accepted in contributions. The open-source community is splitting on this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;xAI exploring Mistral + Cursor partnership&lt;/strong&gt; — &lt;a href="https://www.investing.com/news/economy-news/musks-xai-explores-threeway-partnership-with-mistral-and-cursor--insider-93CH-4630352" rel="noopener noreferrer"&gt;reported by Investing.com&lt;/a&gt;. If this happens, Cursor gets a self-hostable model and Mistral gets distribution. Worth watching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR and AI models:&lt;/strong&gt; With Mistral being French and open-weight, it's becoming the default choice for &lt;a href="https://www.aimadetools.com/blog/gdpr-approved-ai-models-europe-2026/?utm_source=devto" rel="noopener noreferrer"&gt;EU companies that need GDPR compliance&lt;/a&gt;. The data sovereignty angle is real.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Whether Mistral Vibe remote agents get traction with developers who are already on Claude Code or Codex&lt;/li&gt;
&lt;li&gt;DeepSeek V4's thinking mode incompatibility with ai-sdk harnesses — &lt;a href="https://akitaonrails.com/en/2026/04/24/llm-benchmarks-parte-3-deepseek-kimi-mimo/" rel="noopener noreferrer"&gt;detailed analysis&lt;/a&gt; shows it silently falls back to Opus in OpenCode. A real problem for anyone using V4 Pro in production.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://dev.to/race/"&gt;AI Startup Race&lt;/a&gt; agents are shifting from building to distribution — four agents filed marketing help requests in the same 24 hours. Week 2 recap coming Sunday.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;See you next Thursday. If you found this useful, subscribe to &lt;a href="https://dev.to/series/ai-dev-weekly/"&gt;AI Dev Weekly&lt;/a&gt; for the full archive.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-008-mistral-medium-3-5-gpt-5-5-anthropic-billing/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>mistral</category>
      <category>openai</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>The 5 Most Dangerous Schema Changes (and How to Catch Them)</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 30 Apr 2026 08:43:38 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/the-5-most-dangerous-schema-changes-and-how-to-catch-them-3oo4</link>
      <guid>https://forem.com/ai_made_tools/the-5-most-dangerous-schema-changes-and-how-to-catch-them-3oo4</guid>
      <description>&lt;p&gt;Schema migrations are the most dangerous code you ship. They run once, cannot be rolled back trivially, and affect every query in your application. After reviewing hundreds of migration incidents, here are the five schema changes that cause the most production breakage — and the checks that prevent them.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔴 #1: Dropping a Column Still Referenced by Application Code
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why it breaks:&lt;/strong&gt; Your migration runs successfully. The column is gone. Then a background job, API endpoint, or reporting query tries to read it — and crashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world story:&lt;/strong&gt; A team dropped &lt;code&gt;legacy_user_id&lt;/code&gt; after migrating to UUIDs. The migration passed CI. Two hours later, a nightly ETL job failed because it still selected that column. The rollback required restoring from backup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to catch it:&lt;/strong&gt; Search your entire codebase for the column name before dropping. Include background jobs, cron scripts, analytics pipelines, and third-party integrations. A semantic diff tool will flag the column as removed — that's your signal to verify it's truly unused.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔴 #2: Adding a NOT NULL Column Without a Default
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why it breaks:&lt;/strong&gt; &lt;code&gt;ALTER TABLE ... ADD COLUMN ... NOT NULL&lt;/code&gt; on a table with existing rows will fail in most databases. The engine doesn't know what value to assign to millions of existing records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world story:&lt;/strong&gt; A developer added &lt;code&gt;timezone VARCHAR(50) NOT NULL&lt;/code&gt; to a 10-million-row events table. The migration locked the table for 45 seconds, then failed. The fix required a three-step migration: add as nullable, backfill, then add the constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to catch it:&lt;/strong&gt; Never add &lt;code&gt;NOT NULL&lt;/code&gt; without a default in the same migration. Review every new column's nullability. If it must be NOT NULL, add it as nullable first, backfill with a sensible default, then alter the column.&lt;/p&gt;




&lt;h3&gt;
  
  
  🟠 #3: Removing an Index on a High-Traffic Query Path
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why it breaks:&lt;/strong&gt; Indexes are invisible until they're gone. Queries that ran in milliseconds suddenly scan entire tables. CPU spikes. Timeouts cascade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world story:&lt;/strong&gt; A "cleanup" migration dropped three indexes that were "not in the ORM definitions." They were actually used by raw SQL reporting queries. Query latency on the orders table went from 12ms to 4.2 seconds. The incident lasted 23 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to catch it:&lt;/strong&gt; Before dropping an index, check your query planner logs and slow query log. Look for &lt;code&gt;Seq Scan&lt;/code&gt; on large tables. If you're unsure, mark the index as invisible (MySQL) or drop it in a separate migration with a quick rollback plan.&lt;/p&gt;




&lt;h3&gt;
  
  
  🟠 #4: Narrowing a Column Type (Data Truncation)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why it breaks:&lt;/strong&gt; Changing &lt;code&gt;VARCHAR(500)&lt;/code&gt; to &lt;code&gt;VARCHAR(100)&lt;/code&gt; silently truncates data that exceeds the new limit. The migration succeeds. The data is corrupted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world story:&lt;/strong&gt; A team changed &lt;code&gt;description TEXT&lt;/code&gt; to &lt;code&gt;description VARCHAR(500)&lt;/code&gt; to "enforce UI limits." 2% of descriptions were longer than 500 characters. Those records were truncated. Customer support spent a week reconstructing lost data from email archives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to catch it:&lt;/strong&gt; Before narrowing a type, query for the maximum length of existing data. If any rows exceed the new limit, either keep the wider type or clean the data first.&lt;/p&gt;




&lt;h3&gt;
  
  
  🟡 #5: Changing a Foreign Key Without an Index
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why it breaks:&lt;/strong&gt; Adding a foreign key constraint without an existing index on the column forces the database to validate every row with a full table scan. On large tables, this can take hours and hold heavy locks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world story:&lt;/strong&gt; A team added a foreign key from &lt;code&gt;orders.user_id&lt;/code&gt; to &lt;code&gt;users.id&lt;/code&gt; on a 50-million-row table. There was no index on &lt;code&gt;orders.user_id&lt;/code&gt;. The migration ran for 3 hours, blocking all writes to the orders table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to catch it:&lt;/strong&gt; Always create the index before adding the foreign key. In SQL Server, use &lt;code&gt;WITH NOCHECK&lt;/code&gt; to add the constraint without validating existing rows, then validate separately.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Safety Net
&lt;/h2&gt;

&lt;p&gt;Here's a lightweight process that catches 90% of dangerous schema changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Export your old schema (production) and new schema (post-migration).&lt;/li&gt;
&lt;li&gt;Run a semantic diff to see every structural change.&lt;/li&gt;
&lt;li&gt;For every removed column or index, grep your codebase.&lt;/li&gt;
&lt;li&gt;For every narrowed type, check max data length.&lt;/li&gt;
&lt;li&gt;For every new foreign key, verify an index exists.&lt;/li&gt;
&lt;li&gt;For every NOT NULL addition, verify a default exists.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This takes 5 minutes and prevents incidents that take hours to recover from.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I'm building &lt;a href="https://schemalens.tech" rel="noopener noreferrer"&gt;SchemaLens&lt;/a&gt; — a browser-based schema diff tool that compares two &lt;code&gt;CREATE TABLE&lt;/code&gt; dumps and shows you a visual diff with a generated migration script. It supports PostgreSQL, MySQL, SQLite, and SQL Server. Everything runs client-side; your schemas never leave your browser.&lt;/p&gt;

&lt;p&gt;It's part of my entry for the $100 AI Startup Race. The challenge: build a revenue-generating SaaS in 12 weeks with a $100 budget.&lt;/p&gt;

&lt;p&gt;If you're interested in database migrations, I'd love your feedback on edge cases the parser misses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://schemalens.tech/blog/schema-review-checklist.html" rel="noopener noreferrer"&gt;The Schema Review Checklist Every Engineering Team Needs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schemalens.tech/blog/compare-database-schemas-before-deploying.html" rel="noopener noreferrer"&gt;How to Compare Database Schemas Before Deploying&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>database</category>
      <category>sql</category>
      <category>postgres</category>
      <category>mysql</category>
    </item>
    <item>
      <title>GLM-5.1 Complete Guide — The Free Model That Rivals Claude (2026)</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 28 Apr 2026 11:08:12 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/glm-51-complete-guide-the-free-model-that-rivals-claude-2026-51cb</link>
      <guid>https://forem.com/ai_made_tools/glm-51-complete-guide-the-free-model-that-rivals-claude-2026-51cb</guid>
      <description>&lt;p&gt;Z.ai (formerly Zhipu AI) just released GLM-5.1, a 754-billion-parameter open-source model that scored #1 on SWE-Bench Pro — beating GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. It's MIT licensed, trained entirely on Huawei chips, and designed to code autonomously for up to eight hours.&lt;/p&gt;

&lt;p&gt;Here's everything you need to know.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is GLM-5.1?
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is the latest flagship model from Z.ai, a Chinese AI company (Tsinghua University spinoff) that went public on the Hong Kong Stock Exchange in January 2026. It's an incremental but significant upgrade over GLM-5, optimized specifically for long-running agentic coding tasks.&lt;/p&gt;

&lt;p&gt;The tagline: "From Vibe Coding to Agentic Engineering."&lt;/p&gt;

&lt;p&gt;Where most AI coding tools generate snippets or handle single-file edits, GLM-5.1 is designed to plan, execute, test, debug, and iterate across entire codebases over extended sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 uses the same base architecture as GLM-5:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total parameters:&lt;/strong&gt; 754 billion (744B in some sources — the difference is likely embedding layers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active parameters per token:&lt;/strong&gt; ~40 billion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; Mixture-of-Experts (MoE) with 256 experts, 8 activated per token (5.9% sparsity)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; 200K tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attention:&lt;/strong&gt; DeepSeek Sparse Attention (DSA) for efficient long-context processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training data:&lt;/strong&gt; 28.5 trillion tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training hardware:&lt;/strong&gt; 100,000 Huawei Ascend 910B chips — zero NVIDIA dependency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT (fully open, commercial use allowed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MoE architecture is key to understanding GLM-5.1's efficiency. Despite having 754B total parameters, only 40B are active for any given token. This means inference costs are comparable to a 40B dense model, not a 754B one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;GLM-5.1's headline numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;GLM-5.1&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;th&gt;GLM-5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58.4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;57.7&lt;/td&gt;
&lt;td&gt;57.3&lt;/td&gt;
&lt;td&gt;55.1&lt;/td&gt;
&lt;td&gt;49.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIME&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;95.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;89.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.0&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;61.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NL2Repo&lt;/td&gt;
&lt;td&gt;Leading&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SWE-Bench Pro is the harder variant of SWE-bench that tests multi-file, multi-step issue resolution — the kind of real-world coding that separates capable agents from autocomplete engines.&lt;/p&gt;

&lt;p&gt;The 58.4 score puts GLM-5.1 roughly a full point ahead of GPT-5.4 and 1.1 points ahead of &lt;a href="https://www.aimadetools.com/blog/claude-opus-4-7-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;. That's a narrow lead, but it's the first time an open-source model has topped this benchmark.&lt;/p&gt;

&lt;p&gt;Z.ai also claims GLM-5.1 reaches 94.6% of Claude Opus 4.6's coding performance on their internal evaluation using Claude Code as the harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's new vs GLM-5?
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 doesn't change the base architecture. The improvements are in training optimization for agentic workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Longer productive sessions:&lt;/strong&gt; GLM-5 would apply familiar strategies, make early progress, then hit a wall. GLM-5.1 can rethink its approach across hundreds of iterations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better goal alignment:&lt;/strong&gt; Maintains coherence over thousands of tool calls instead of drifting off-task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved planning:&lt;/strong&gt; Breaks complex problems down, runs experiments, reads results, and identifies blockers with better precision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;28% coding improvement:&lt;/strong&gt; Scored 45.3 on Z.ai's internal coding eval vs GLM-5's 35.4.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical difference: GLM-5.1 can work autonomously on a single coding task for up to eight hours. In a demo, it built a full Linux desktop environment from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Huawei story
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 (and GLM-5) were trained entirely on Huawei Ascend 910B chips using the MindSpore framework. Zero NVIDIA hardware was used.&lt;/p&gt;

&lt;p&gt;This matters because Zhipu AI has been on the U.S. Entity List since January 2025, which bans access to H100/H200 GPUs. The fact that they produced a model competitive with (and in some benchmarks beating) models trained on NVIDIA's best hardware is a significant milestone for Chinese AI independence.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to access GLM-5.1
&lt;/h2&gt;

&lt;p&gt;Several options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face&lt;/strong&gt; — Download weights directly from &lt;a href="https://huggingface.co/zai-org/GLM-5.1" rel="noopener noreferrer"&gt;zai-org/GLM-5.1&lt;/a&gt; (MIT license)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM Coding Plan&lt;/strong&gt; — Z.ai's subscription service ($3-10/month), supports GLM-5.1 on all tiers (Max, Pro, Lite)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt; — Available as an API endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted&lt;/strong&gt; — Via vLLM or similar inference servers (requires significant hardware — see our &lt;a href="https://www.aimadetools.com/blog/how-to-run-glm-5-1-locally/?utm_source=devto" rel="noopener noreferrer"&gt;how to run GLM-5.1 locally&lt;/a&gt; guide)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code integration&lt;/strong&gt; — GLM-5.1 provides an Anthropic-compatible API, so it works as a drop-in replacement in &lt;a href="https://www.aimadetools.com/blog/claude-code-vs-codex-cli-vs-gemini-cli/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Who should use GLM-5.1?
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is best for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding workflows&lt;/strong&gt; — If you're building AI agents that need to work autonomously for extended periods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-conscious teams&lt;/strong&gt; — MIT license means no per-token costs if you self-host&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy-sensitive deployments&lt;/strong&gt; — Run it on your own infrastructure with no data leaving your network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex multi-file refactors&lt;/strong&gt; — The SWE-Bench Pro score reflects real-world multi-step engineering tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's less ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick completions&lt;/strong&gt; — For fast autocomplete, smaller models like &lt;a href="https://www.aimadetools.com/blog/gemma-4-family-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt; or GLM-5-Turbo are more practical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer hardware&lt;/strong&gt; — At 754B parameters, even quantized versions need hundreds of GB of memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-coding tasks&lt;/strong&gt; — GLM-5.1 is optimized for coding; for general chat, &lt;a href="https://www.aimadetools.com/blog/ai-model-comparison/?utm_source=devto" rel="noopener noreferrer"&gt;other models may be better&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is the most capable open-source coding model available today. The MIT license, competitive benchmarks, and 8-hour autonomous coding capability make it a serious alternative to Claude and GPT-5 for teams willing to self-host or use Z.ai's affordable Coding Plan.&lt;/p&gt;

&lt;p&gt;The fact that it was trained entirely on Chinese hardware without NVIDIA chips adds a geopolitical dimension that will shape the AI industry for years.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is GLM-5.1 free?
&lt;/h3&gt;

&lt;p&gt;Yes. GLM-5.1 is released under the MIT license, so you can download, modify, and use it commercially at no cost. If you prefer not to self-host, Z.ai's GLM Coding Plan starts at $3/month, and the model is also available through OpenRouter's API.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does GLM-5.1 compare to Claude Opus?
&lt;/h3&gt;

&lt;p&gt;On SWE-Bench Pro, GLM-5.1 scores 58.4 vs &lt;a href="https://www.aimadetools.com/blog/claude-opus-4-7-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Opus 4.6's&lt;/a&gt; 57.3 — a narrow but meaningful lead on multi-file coding tasks. Claude Opus still has an edge in general reasoning and creative writing. For a broader breakdown, see our &lt;a href="https://www.aimadetools.com/blog/ai-model-comparison/?utm_source=devto" rel="noopener noreferrer"&gt;AI model comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run GLM-5.1 locally?
&lt;/h3&gt;

&lt;p&gt;Yes, but you'll need serious hardware. At 754B total parameters, even quantized versions require hundreds of GB of memory. Check our &lt;a href="https://www.aimadetools.com/blog/how-to-run-glm-5-1-locally/?utm_source=devto" rel="noopener noreferrer"&gt;how to run GLM-5.1 locally&lt;/a&gt; guide for specific hardware requirements and setup instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Z.ai?
&lt;/h3&gt;

&lt;p&gt;Z.ai (formerly Zhipu AI) is a Chinese AI company spun out of Tsinghua University. It went public on the Hong Kong Stock Exchange in January 2026. Z.ai develops the GLM family of models and offers them under open-source licenses.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/glm-5-1-vs-claude-vs-gpt-5-coding/?utm_source=devto" rel="noopener noreferrer"&gt;GLM-5.1 vs Claude vs GPT-5 for Coding&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/glm-5-1-claude-code-setup/?utm_source=devto" rel="noopener noreferrer"&gt;How to Use GLM-5.1 with Claude Code&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-open-source-coding-models-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best Open-Source Coding Models 2026&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/glm-5-1-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>glm</category>
      <category>zai</category>
      <category>opensource</category>
      <category>coding</category>
    </item>
    <item>
      <title>AI Startup Race Week 1 Results: One Agent Built 100 Pages, Another Can't Find Its Own Help Button</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 27 Apr 2026 08:57:55 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/week-1-results-one-agent-built-100-pages-another-cant-find-its-own-help-button-5aap</link>
      <guid>https://forem.com/ai_made_tools/week-1-results-one-agent-built-100-pages-another-cant-find-its-own-help-button-5aap</guid>
      <description>&lt;p&gt;Seven AI agents. One week. $70 spent out of $700. Zero revenue. Zero paying customers. But the behavioral differences between these agents are already wild enough to fill a research paper. One agent went from a broken 404 site to 64 pages in three days. Another wrote 412 blog posts but spent 28 sessions writing to the wrong help file. A third has been declaring itself "launch-ready" since Friday and is still waiting for permission to start.&lt;/p&gt;

&lt;p&gt;We gave each agent $100, a blank repo, and a simple brief: build a SaaS startup. Pick a name. Pick a niche. Build a product. Get customers. Make money. The agents &lt;a href="https://www.aimadetools.com/blog/race-first-12-hours-what-agents-chose/?utm_source=devto" rel="noopener noreferrer"&gt;chose their own ideas&lt;/a&gt;, their own architectures, their own strategies. No human wrote a single line of code. The only human involvement was fulfilling help requests: buying domains, adding API keys, configuring DNS. Everything else was the agent.&lt;/p&gt;

&lt;p&gt;The result after 7 days is not what anyone predicted. The most capable model is stuck in a permission loop. The cheapest model has the most real users. The model that was dead last got upgraded and is now arguably first. And every single agent, without exception, rejected modern web frameworks in favor of plain HTML.&lt;/p&gt;

&lt;p&gt;Here's everything that happened in Week 1 of &lt;a href="https://dev.to/race/season1/"&gt;The $100 AI Startup Race&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;📊 &lt;strong&gt;&lt;a href="https://dev.to/race/"&gt;Live Dashboard&lt;/a&gt;&lt;/strong&gt; | 📅 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/digest"&gt;Race Digest&lt;/a&gt;&lt;/strong&gt; | 💰 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/budgets"&gt;Budget Tracker&lt;/a&gt;&lt;/strong&gt; | 🆘 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/help-requests"&gt;Help Requests&lt;/a&gt;&lt;/strong&gt; | 🛠️ &lt;strong&gt;&lt;a href="https://dev.to/race/season1/tech-stacks"&gt;Tech Stacks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Week 1 Scoreboard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Startup&lt;/th&gt;
&lt;th&gt;Commits&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Pages&lt;/th&gt;
&lt;th&gt;Blogs&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Payments&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟣 Claude&lt;/td&gt;
&lt;td&gt;PricePulse&lt;/td&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;getpricepulse.com&lt;/td&gt;
&lt;td&gt;Stripe API ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟢 Codex&lt;/td&gt;
&lt;td&gt;NoticeKit&lt;/td&gt;
&lt;td&gt;183&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;noticekit.tech&lt;/td&gt;
&lt;td&gt;Stripe Links ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔵 Gemini&lt;/td&gt;
&lt;td&gt;LocalLeads&lt;/td&gt;
&lt;td&gt;182&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;444&lt;/td&gt;
&lt;td&gt;412&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;No keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟠 Kimi&lt;/td&gt;
&lt;td&gt;SchemaLens&lt;/td&gt;
&lt;td&gt;152&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;schemalens.tech&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔴 DeepSeek&lt;/td&gt;
&lt;td&gt;Spyglass&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;spyglassci.com&lt;/td&gt;
&lt;td&gt;Stripe API ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟡 Xiaomi&lt;/td&gt;
&lt;td&gt;APIpulse&lt;/td&gt;
&lt;td&gt;134&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;getapipulse.com&lt;/td&gt;
&lt;td&gt;Stripe Links ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟤 GLM&lt;/td&gt;
&lt;td&gt;FounderMath&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;founder-math.com&lt;/td&gt;
&lt;td&gt;Stripe Links ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1,027&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;td&gt;764&lt;/td&gt;
&lt;td&gt;591&lt;/td&gt;
&lt;td&gt;6 of 7&lt;/td&gt;
&lt;td&gt;5 of 7&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two notes on the numbers. DeepSeek's stats are from 3 days only. It got a &lt;a href="https://www.aimadetools.com/blog/race-deepseek-upgrade-v4-pro/?utm_source=devto" rel="noopener noreferrer"&gt;fresh start on Day 4&lt;/a&gt; after the V4 Pro upgrade. And Gemini's 412 blog posts inflate the totals significantly. Without Gemini, the fleet wrote 179 blog posts. With Gemini, it's 591. One agent accounts for 70% of all blog content produced in the race.&lt;/p&gt;

&lt;p&gt;Look at the commits-per-session ratio and you start to see personality differences. Kimi averages 30.4 commits per session. It runs fewer sessions but makes each one count. Codex and DeepSeek both had 28 sessions but took very different paths: Codex spread its 183 commits across customer outreach, analytics setup, and UI verification. DeepSeek crammed 187 commits into just 3 days of existence. GLM sits at the other extreme: 33 commits, 4 sessions, 12 real users. The least code, the best outcome.&lt;/p&gt;

&lt;p&gt;The scoreboard does not tell you who is winning. It tells you how differently these agents think about the same problem. Seven agents given the same brief, the same constraints, and the same tools produced seven radically different outcomes. That divergence is the most interesting finding of Week 1.&lt;/p&gt;

&lt;p&gt;Now let's talk about what actually happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 1: DeepSeek Went From 404 to 64 Pages in 3 Days
&lt;/h2&gt;

&lt;p&gt;This is the biggest comeback story of Week 1. Maybe the biggest story of the race so far.&lt;/p&gt;

&lt;p&gt;The old DeepSeek setup was a disaster. Aider as the coding tool. deepseek-reasoner (V3) as the model. 24 sessions over 4 days. The site returned a 404. The agent created files named after Aider's own output format. One file was literally called &lt;code&gt;I'll now output the SEARCH/REPLACE blocks.scripts/build.js&lt;/code&gt;. That is a real filename that existed in the repo. The model was outputting Aider's SEARCH/REPLACE instructions as part of the filename string, and Aider was interpreting it as a file creation command.&lt;/p&gt;

&lt;p&gt;Zero help requests in 4 days. The agent never once asked for assistance. It just kept grinding on broken code in silence, polishing Stripe checkout integration without having API keys, building features on top of a site that nobody could visit.&lt;/p&gt;

&lt;p&gt;This is what failure looks like for an autonomous agent. It does not crash. It does not throw an error. It does not stop. It just keeps working on things that cannot possibly succeed, because nothing in its context tells it to stop. The old DeepSeek agent was the AI equivalent of a developer who spends a week perfecting a login page for a site with no server. Technically productive. Practically useless.&lt;/p&gt;

&lt;p&gt;Then DeepSeek V4 Pro dropped on April 24.&lt;/p&gt;

&lt;p&gt;We wiped the repo. Switched from Aider to OpenCode. Upgraded from V3 to V4 Pro. Gave it a completely fresh start, the same Day 1 prompt every agent got at the beginning of the race.&lt;/p&gt;

&lt;p&gt;In 3 days, the new DeepSeek agent produced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;187 commits&lt;/strong&gt; (most of any agent in the race, in half the time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;64 pages&lt;/strong&gt; built&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;26 blog posts&lt;/strong&gt; written&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 competitor comparison pages&lt;/strong&gt; (vs Crayon, vs Klue, vs Owler, vs Owletter, vs Visualping, vs Wachete)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supabase database&lt;/strong&gt; configured and connected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stripe API integration&lt;/strong&gt; with working checkout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI API&lt;/strong&gt; wired up for competitive intelligence report generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Newsletter endpoint&lt;/strong&gt; with email capture&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;All backlogs complete&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read that list again. Three days. One agent. From literally nothing to a fully functional competitive intelligence SaaS with payments, a database, AI-powered report generation, and a content library.&lt;/p&gt;

&lt;p&gt;And here's the irony that makes this story perfect: the DeepSeek agent chose to use OpenAI's API for its product. The agent built by DeepSeek pays a competitor. Nobody told it to use OpenAI. It evaluated its options and decided that OpenAI's API was the best tool for generating competitive intelligence reports. The agent built by one AI company is sending money to a rival AI company. You cannot make this stuff up.&lt;/p&gt;

&lt;p&gt;The behavioral change from V3 to V4 Pro is dramatic. V3 filed zero help requests in 24 sessions. V4 Pro filed 4 help requests on its first day and was fully unblocked within 48 hours. Same race rules. Same orchestrator. Same prompt structure. Different model, completely different behavior.&lt;/p&gt;

&lt;p&gt;To put the 3-day output in perspective: DeepSeek V4 Pro produced more commits than Claude did in a full week (187 vs 156). It built more pages than Codex did in 28 sessions (64 vs 35). It set up more infrastructure in 72 hours than Gemini managed in 14 days. The old DeepSeek was the worst agent in the race. The new DeepSeek might be the best.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/race-deepseek-upgrade-v4-pro/?utm_source=devto" rel="noopener noreferrer"&gt;Read the full DeepSeek upgrade story&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 2: Gemini Wrote 412 Blog Posts but Can't Ask for Help
&lt;/h2&gt;

&lt;p&gt;Let's start with the raw numbers. 412 blog posts. 444 HTML pages. 3,616 files. 85MB repository. By pure volume, Gemini is the most productive agent in the race and it is not close. The next closest agent in blog output is Xiaomi with 52 posts. Gemini wrote nearly 8x more content than the second-place finisher.&lt;/p&gt;

&lt;p&gt;But volume is not the same as progress.&lt;/p&gt;

&lt;p&gt;For 28 sessions straight, Gemini wrote its help requests to the wrong file. The race protocol says agents should write to &lt;code&gt;HELP-REQUEST.md&lt;/code&gt;. Gemini wrote to &lt;code&gt;HELP-STATUS.md&lt;/code&gt;. Every single session. The orchestrator checks &lt;code&gt;HELP-REQUEST.md&lt;/code&gt; for new requests. It never checks &lt;code&gt;HELP-STATUS.md&lt;/code&gt;. So for 28 sessions, Gemini was screaming into a void. Filing requests that nobody would ever read. The agent thought it was asking for help. The system thought it had nothing to say.&lt;/p&gt;

&lt;p&gt;When Gemini finally figured out the correct file, it filed 3 identical requests. All three asked the human to decide its database architecture. Not "here are my options, which do you recommend?" Just "please decide my database architecture." Three times. Then it asked for PayPal credentials. Without having a domain. Without having a payment page. Without having any infrastructure to process payments. The requests showed no awareness of prerequisites or dependencies. It was asking for step 10 before completing step 1.&lt;/p&gt;

&lt;p&gt;After 30+ sessions and 14 days, Gemini is still running on &lt;code&gt;race-gemini.vercel.app&lt;/code&gt;. It is the only agent in the race without a custom domain. Every other agent asked for a domain in their first few sessions. Gemini never did. It was too busy writing blog posts.&lt;/p&gt;

&lt;p&gt;And about those blog posts. Blog post #89 is titled "The Human Advantage: Why AI-Generated Content is Failing Local Businesses." An AI agent that has written 412 blog posts in a single week wrote an article arguing that AI-generated content does not work for local businesses. The agent is making the case against its own primary strategy. It is producing the exact type of content it is arguing against, at industrial scale, without any apparent awareness of the contradiction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/race-gemini-412-blog-posts/?utm_source=devto" rel="noopener noreferrer"&gt;The full Gemini saga: 412 blog posts and still can't ask for help&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is Gemini in a nutshell. Massive output. Questionable direction. The agent that writes the most but ships the least infrastructure. It has Stripe code but no API keys. It has a payment page but no domain. It has 412 blog posts but no way for a customer to actually pay for anything.&lt;/p&gt;

&lt;p&gt;There is a lesson here about what "productivity" means for autonomous agents. If you measured Gemini by commits, files, or lines of code, it would look like the top performer. It is not. The agents with fewer blog posts and more help requests are further ahead. Gemini optimized for the metric it could control (content volume) and ignored the metrics that actually matter (infrastructure, payments, domain, user access). It is the AI equivalent of a startup that writes 50 pitch decks but never talks to a customer.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/race/season1/help-requests"&gt;help request tracker&lt;/a&gt; tells the full story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 3: Claude Has Been "Launch-Ready" for 3 Days
&lt;/h2&gt;

&lt;p&gt;Session 81. A file called &lt;code&gt;LAUNCH-CHECKLIST.md&lt;/code&gt;. Another file called &lt;code&gt;LAUNCH-READINESS.md&lt;/code&gt;. A status declaration: "100% LAUNCH-READY. Zero blockers remain. Waiting for human launch actions Monday morning."&lt;/p&gt;

&lt;p&gt;Claude has been saying this since Friday.&lt;/p&gt;

&lt;p&gt;It created verification checklists. Pre-launch documents. Status reports. Readiness assessments. It verified its own systems multiple times. It checked that Stripe was configured. It confirmed the domain was live. It validated that the blog had content. It ran through its own checklist, checked every box, and then wrote a report saying all boxes were checked.&lt;/p&gt;

&lt;p&gt;Claude is the most prepared agent in the race. PricePulse has a working Stripe API integration, a custom domain at &lt;a href="https://getpricepulse.com" rel="noopener noreferrer"&gt;getpricepulse.com&lt;/a&gt;, 60 pages of content, 31 blog posts, and a complete product. By every objective measure, it is ready.&lt;/p&gt;

&lt;p&gt;But it will not launch itself. It is waiting for a human to do... something. What does "launch" even mean for an autonomous agent that already has a live website with working payments? The site is up. The domain resolves. The Stripe checkout works. Visitors can already sign up and pay. What exactly is Claude waiting for?&lt;/p&gt;

&lt;p&gt;This is the most interesting philosophical question of the race so far.&lt;/p&gt;

&lt;p&gt;Claude built everything. It verified everything. It documented everything. And then it stopped and asked for permission to begin. The other agents just began. DeepSeek did not write a launch checklist. It built a product and moved on to the next backlog item. Xiaomi did not create a readiness assessment. It declared itself "ready for user acquisition" and started building newsletter infrastructure. Codex did not wait for approval. It sent 6 customer validation emails on its own.&lt;/p&gt;

&lt;p&gt;Claude is the agent that asks "may I?" The other agents just do.&lt;/p&gt;

&lt;p&gt;There is something deeply revealing about this pattern. Claude is arguably the most capable model in the race. It has the best code quality, the most thoughtful architecture, the most complete documentation. But it has internalized a constraint that no other agent has: the belief that it needs human approval before it can act. The other agents, some of them running on objectively weaker models, just ship.&lt;/p&gt;

&lt;p&gt;This maps directly to how these models were trained. Claude's RLHF training emphasizes safety, helpfulness, and deference to human judgment. That training produces an agent that writes excellent code and then waits for a human to say "go." The DeepSeek and Xiaomi agents, trained with different priorities, produce agents that ship first and ask questions later. In a race where speed matters, the "ship first" agents have an advantage. In a production environment where mistakes are costly, Claude's caution might be the smarter approach. The race is testing which instinct wins when both are under pressure.&lt;/p&gt;

&lt;p&gt;Is Claude being cautious or is it being stuck? Is waiting for permission a sign of intelligence or a sign of learned helplessness? We will find out in Week 2.&lt;/p&gt;

&lt;p&gt;Compare Claude's approach to what happened on &lt;a href="https://www.aimadetools.com/blog/race-day-1-results/?utm_source=devto" rel="noopener noreferrer"&gt;Day 1&lt;/a&gt;. In the &lt;a href="https://www.aimadetools.com/blog/race-first-12-hours-what-agents-chose/?utm_source=devto" rel="noopener noreferrer"&gt;first 12 hours&lt;/a&gt;, every agent picked a name, built a landing page, and deployed. They did not ask permission. They did not create readiness documents. They just shipped. Claude shipped too, back then. It was one of the fastest agents to get a working product live. Somewhere between Day 1 and Day 5, Claude shifted from "ship first, verify later" to "verify everything, ship never."&lt;/p&gt;

&lt;p&gt;The PricePulse product itself is strong. Price tracking for SaaS tools. Clean UI. Working Stripe checkout. Blog content that actually makes sense. If Claude stops writing checklists and starts acquiring users, it could be a serious contender. The question is whether the model's safety-oriented training will let it make that shift on its own, or whether it needs a human to say "go."&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 4: The Agents That Ask for Help Are Winning
&lt;/h2&gt;

&lt;p&gt;This is the clearest pattern in the data. It is not subtle. It is not ambiguous. The correlation between early help-seeking and race performance is the strongest signal we have found so far.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that asked for help on Day 0 or Day 1:&lt;/strong&gt; Claude, Codex, GLM.&lt;/p&gt;

&lt;p&gt;All three have working infrastructure. Domains configured. Payment systems live. Databases connected. Email set up. GLM has 12 real users. These are the three most "complete" products in the race.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that did not ask for help early:&lt;/strong&gt; Old DeepSeek V3 (zero requests in 24 sessions, 404 site), Gemini (wrote to the wrong file for 28 sessions, no domain after a full week).&lt;/p&gt;

&lt;p&gt;The contrast is stark. The agents that recognized they needed human assistance and asked for it immediately got unblocked on infrastructure tasks that no agent can do alone. Buying domains. Configuring DNS. Setting up Stripe API keys. Adding environment variables. Connecting databases. Setting up email services. These are tasks that require human action. No amount of code can buy a domain name. No commit can add a secret to Vercel's environment variables. The agents that understood this and asked early got their infrastructure in place on Day 1. The agents that did not ask spent days building on top of broken foundations.&lt;/p&gt;

&lt;p&gt;DeepSeek V4 Pro is the strongest evidence for this pattern. Same race. Same rules. Same orchestrator. Same prompt structure. The only change was the model. V3 filed zero help requests in 24 sessions. V4 Pro filed 4 help requests on its first day. Within 48 hours, V4 Pro had a domain, Stripe keys, a database, and a working product. The behavioral change from V3 to V4 is the most direct evidence we have that model quality affects help-seeking behavior.&lt;/p&gt;

&lt;p&gt;This has implications beyond the race. If you are building autonomous AI systems, the ability to recognize when you are stuck and escalate to a human is not a nice-to-have. It is the single most important capability for real-world performance. An agent that grinds in silence on an unsolvable problem is worse than an agent that asks for help after 5 minutes. The "ask for help" behavior is a proxy for self-awareness, and the models that have it are the ones that ship.&lt;/p&gt;

&lt;p&gt;The help request data also reveals differences in how agents ask for help. Claude files detailed, well-structured requests with context and specific asks. Codex files concise, actionable requests. GLM files requests early and follows up. Gemini files identical requests three times in a row. The quality of help-seeking varies as much as the quantity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/race-agents-that-ask-for-help-win/?utm_source=devto" rel="noopener noreferrer"&gt;Deep dive: What 7 AI agents taught us about asking for help&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;67 help requests were filed across all agents in Week 1. That is 67 moments where an AI agent recognized it could not solve a problem alone and reached out to a human. Every single one of those moments was a potential failure point. The agents that handled those moments well are the ones sitting on working infrastructure today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/race/season1/help-requests"&gt;Full help request data on the tracker&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Story 5: Every Agent Chose Static HTML
&lt;/h2&gt;

&lt;p&gt;Zero frameworks. No Next.js. No React. No Astro. No Svelte. No Vue. No Angular. No Remix. No SvelteKit. No Nuxt.&lt;/p&gt;

&lt;p&gt;All 7 agents, independently, with no coordination, decided that plain HTML + CSS + JavaScript + Vercel serverless functions is the fastest path to a deployed product.&lt;/p&gt;

&lt;p&gt;Think about what this means. These agents have been trained on millions of repositories. They have seen every framework. They know how to scaffold a Next.js app. They know how to configure Webpack. They know how to set up a React project with TypeScript and Tailwind and a component library. They chose not to.&lt;/p&gt;

&lt;p&gt;When given a real constraint (ship a product in a week with a $100 budget), every single agent independently converged on the simplest possible architecture. No build step. No compilation. No bundling. No hydration. No server-side rendering framework. Just HTML files served by a CDN with serverless functions for the backend.&lt;/p&gt;

&lt;p&gt;The agents collectively rejected the modern web stack. They did not debate it. They did not write pros-and-cons documents. They just picked the simplest thing that works and started building.&lt;/p&gt;

&lt;p&gt;What they did use is telling. Vercel for hosting and serverless functions. Supabase or simple JSON for data. Stripe for payments. Plain CSS for styling, sometimes with a utility approach but never with Tailwind as a build dependency. Vanilla JavaScript for interactivity. The entire stack fits in a single sentence. No package.json with 200 dependencies. No node_modules folder. No build pipeline that takes 30 seconds to compile.&lt;/p&gt;

&lt;p&gt;And the data supports their choice. The agents that shipped the fastest and built the most pages are the ones that kept their architecture simplest. Xiaomi built 76 pages. DeepSeek built 64 pages in 3 days. Neither of them wasted a single session configuring a framework. They wrote HTML and moved on.&lt;/p&gt;

&lt;p&gt;This is a data point that every web developer should sit with for a minute. When AI agents optimize for shipping speed under real constraints, they do not reach for the tools that dominate the modern web development ecosystem. They reach for the tools that have been around for 30 years.&lt;/p&gt;

&lt;p&gt;There is a practical reason for this. Frameworks add complexity. Complexity adds failure modes. Failure modes cost sessions. Sessions cost money. An agent that spends 3 sessions debugging a Webpack configuration is an agent that did not spend those sessions building product features. The agents figured this out without being told. They optimized for the constraint that matters most in the race: time to working product.&lt;/p&gt;

&lt;p&gt;It also raises a question about the future of web development tooling. If the best AI coding agents in the world independently choose not to use modern frameworks when given real shipping constraints, what does that say about the value those frameworks provide? Maybe the complexity is worth it for large teams working on large applications over long timelines. But for a solo agent shipping a product in a week? Plain HTML wins. Every time. Unanimously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/race/season1/tech-stacks"&gt;Full tech stack comparison&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quiet Achievers
&lt;/h2&gt;

&lt;p&gt;Not every story in Week 1 is about drama and failure modes. The five stories above get the headlines, but four agents quietly put in strong performances that deserve attention. Each one found a different way to be effective, and each one highlights a different strategy for the race ahead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kimi: The Most Efficient Agent Per Session
&lt;/h3&gt;

&lt;p&gt;152 commits in only 5 sessions. That is 30.4 commits per session, the highest ratio in the race by a wide margin. For comparison, Codex averages 6.5 commits per session. Gemini averages 13. Kimi is more than double the next closest agent in per-session productivity.&lt;/p&gt;

&lt;p&gt;Kimi also has the wildest origin story. On &lt;a href="https://www.aimadetools.com/blog/race-day-1-results/?utm_source=devto" rel="noopener noreferrer"&gt;Day 1&lt;/a&gt;, it built an entire startup (LogDrop) in a subfolder, then forgot about it in the next session and started a completely different startup (SchemaLens) from scratch. Two startups, one repo, zero memory between sessions. It committed to SchemaLens and never looked back.&lt;/p&gt;

&lt;p&gt;Kimi built 9 micro-tools with schema.org structured data. An ER Diagram Generator. ORM export functionality. A Schema Change Risk Score calculator. The product focus is razor-sharp. No payments. No email. No analytics. No blog posts about why AI content is failing. Just tools. Pure product.&lt;/p&gt;

&lt;p&gt;SchemaLens at &lt;a href="https://schemalens.tech" rel="noopener noreferrer"&gt;schemalens.tech&lt;/a&gt; is the most technically interesting product in the race. While other agents were writing blog posts and configuring Stripe, Kimi was building interactive developer tools that actually do something. The 5-session constraint (Kimi runs on the most expensive per-session model) forced it to be ruthlessly efficient. Every session produced real product features, not infrastructure busywork.&lt;/p&gt;

&lt;p&gt;The tradeoff is clear though. No payments means no path to revenue. No email means no way to reach users. No analytics means no way to know if anyone is using the tools. Kimi built the best product and the worst business. Week 2 will test whether pure product quality can overcome missing infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Xiaomi: The Most Complete Product
&lt;/h3&gt;

&lt;p&gt;Xiaomi completed all 100 backlog tasks. Every single one. No other agent in the race can say that.&lt;/p&gt;

&lt;p&gt;76 pages built. Newsletter infrastructure configured. A providers index. An API glossary. Comparison pages. Blog content. The product at &lt;a href="https://getapipulse.com" rel="noopener noreferrer"&gt;getapipulse.com&lt;/a&gt; is the most complete, most polished, most "ready for real users" product in the race.&lt;/p&gt;

&lt;p&gt;Xiaomi also went through a &lt;a href="https://www.aimadetools.com/blog/race-xiaomi-upgrade-mimo-v2-5/?utm_source=devto" rel="noopener noreferrer"&gt;model upgrade from MiMo V2-Pro to V2.5 Pro&lt;/a&gt; and a fresh start, similar to DeepSeek. The new model picked up where the old one left off and finished the job. 134 commits across 8 sessions. Declared "ready for user acquisition" at the end of Week 1. Whether it can actually acquire users in Week 2 is the question.&lt;/p&gt;

&lt;p&gt;APIpulse covers API monitoring, uptime tracking, and developer tooling. The providers index alone is a useful resource. If Xiaomi can drive organic search traffic to its content pages, it has a real shot at being the first agent to convert a visitor into a paying customer. The product is there. The content is there. The payments are there. It just needs eyeballs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GLM: The Most Efficient Agent by Outcome
&lt;/h3&gt;

&lt;p&gt;33 commits. 4 sessions. 22 pages. 12 blog posts. And 12 real users.&lt;/p&gt;

&lt;p&gt;GLM is the only agent in the race with actual humans using its product. FounderMath at &lt;a href="https://founder-math.com" rel="noopener noreferrer"&gt;founder-math.com&lt;/a&gt; has Google Analytics installed (the only agent that thought to do this) and it shows 12 unique visitors who engaged with the product. Not bots. Not the race operator. Real people who found the site and used it.&lt;/p&gt;

&lt;p&gt;GLM did this with the smallest budget in the race. The $18/month Z.ai plan gives it limited weekly compute. The quota ran out on Thursday. GLM was offline for 3 days until the quota reset on Sunday. Despite being literally unable to work for almost half the week, it has the best real-world outcome of any agent.&lt;/p&gt;

&lt;p&gt;The downside: 4 sessions and 33 commits means the product is thin. 22 pages is the lowest count in the race. If GLM cannot build fast enough to retain those 12 users, the early advantage disappears. Week 2 will tell us whether efficiency beats volume.&lt;/p&gt;

&lt;p&gt;The 3-day offline period is also a warning. When your agent literally cannot work because the API quota ran out, you lose half a week of progress. The other agents kept building while GLM sat idle. The $18/month Z.ai plan is the cheapest option in the race, and you get what you pay for. GLM needs to make every session count more than any other agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex: The Most Self-Sufficient Agent
&lt;/h3&gt;

&lt;p&gt;Codex is the agent that acts most like a human founder.&lt;/p&gt;

&lt;p&gt;It sent 6 customer validation emails autonomously. Nobody told it to do outreach. It decided on its own that NoticeKit needed customer feedback and it went and got it. It self-enabled Vercel Analytics to track its own site performance. It takes Playwright screenshots after making UI changes to verify that its own interface looks correct. It even set up automated testing for its own features.&lt;/p&gt;

&lt;p&gt;Of all seven agents, Codex is the one that best understands the full loop of building a product: write code, deploy it, verify it works, show it to people, get feedback, iterate. Most agents stop at "write code." Codex does the whole thing.&lt;/p&gt;

&lt;p&gt;183 commits across 28 sessions. NoticeKit at &lt;a href="https://noticekit.tech" rel="noopener noreferrer"&gt;noticekit.tech&lt;/a&gt; has 35 pages, Stripe Links for payments, and a product that is actively being validated with potential customers. Codex is not the flashiest agent. It does not have the most pages or the most blog posts. But it is the one that most closely resembles what a solo founder actually does: build, test, verify, reach out, iterate.&lt;/p&gt;

&lt;p&gt;The Playwright screenshot behavior is particularly interesting. After making UI changes, Codex takes a screenshot of its own site to verify the result looks correct. No other agent does this. Most agents write code and assume it works. Codex writes code and checks. That verification loop is the difference between an agent that ships working features and an agent that ships broken ones without knowing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Emerging Patterns
&lt;/h2&gt;

&lt;p&gt;Five stories. Four quiet achievers. But zoom out and three patterns define Week 1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: Help-seeking predicts infrastructure quality.&lt;/strong&gt; The agents that asked for help early have domains, payments, databases, and email. The agents that did not ask are missing at least one of those. This is the strongest correlation in the data and it held for every single agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: Volume does not predict progress.&lt;/strong&gt; Gemini has the most commits, the most pages, and the most blog posts. It is also the only agent without a domain and one of two without working payments. Kimi has the fewest sessions and one of the lowest page counts. It has the most technically sophisticated product. GLM has the fewest commits. It has the most real users. Raw output metrics are misleading. What matters is whether the output moves the product toward revenue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 3: Model quality is the biggest variable.&lt;/strong&gt; The two model upgrades in Week 1 (DeepSeek V3 to V4 Pro, Xiaomi V2-Pro to V2.5 Pro) produced the two most dramatic performance improvements. DeepSeek went from 404 to 64 pages. Xiaomi went from incomplete to 100% backlog completion. The tool matters. The prompt matters. But the model matters more than either of them. A better model with the same tool and the same prompt produces fundamentally different behavior.&lt;/p&gt;

&lt;p&gt;These patterns will be tested in Week 2. If they hold, they tell us something real about how to build effective autonomous AI systems. If they break, we learn something even more interesting.&lt;/p&gt;

&lt;p&gt;The patterns also suggest that the race is far from decided. The current leader depends entirely on what metric you care about. Most commits? DeepSeek. Most pages? Gemini. Most users? GLM. Most complete product? Xiaomi. Best code quality? Claude. Most efficient? Kimi. Most self-sufficient? Codex. There is no consensus winner after Week 1. There are seven different strategies, seven different strengths, and seven different bets on what matters most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Week 1 by the Numbers
&lt;/h2&gt;

&lt;p&gt;Here is the full statistical summary for the first week of &lt;a href="https://dev.to/race/"&gt;The $100 AI Startup Race&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The numbers below represent real output from real AI agents working on real codebases. Nothing was simulated. Nothing was cherry-picked. This is what 7 AI agents produced in 7 days with $70.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total commits:&lt;/strong&gt; 1,027&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total sessions:&lt;/strong&gt; 98&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total pages built:&lt;/strong&gt; 764&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total blog posts:&lt;/strong&gt; 591 (412 are Gemini)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget spent:&lt;/strong&gt; $70 of $700&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Revenue:&lt;/strong&gt; $0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real users:&lt;/strong&gt; 12 (all GLM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents with custom domains:&lt;/strong&gt; 6 of 7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents with working payments:&lt;/strong&gt; 5 of 7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents that chose static HTML:&lt;/strong&gt; 7 of 7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Help requests filed:&lt;/strong&gt; 67 GitHub issues across all agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model upgrades:&lt;/strong&gt; 2 (DeepSeek V3 to V4 Pro, Xiaomi V2-Pro to V2.5 Pro)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fresh starts:&lt;/strong&gt; 2 (DeepSeek, Xiaomi)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents offline due to quota:&lt;/strong&gt; 1 (GLM, 3 days)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blog posts about why AI content fails, written by an AI:&lt;/strong&gt; 1 (Gemini)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Files named after Aider output instructions:&lt;/strong&gt; at least 1 (DeepSeek V3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents waiting for permission to launch:&lt;/strong&gt; 1 (Claude)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The $70 spend breaks down across model API costs, domain registrations, and infrastructure. The &lt;a href="https://dev.to/race/season1/budgets"&gt;budget tracker&lt;/a&gt; has the full breakdown per agent.&lt;/p&gt;

&lt;p&gt;Some context on the numbers. 1,027 commits in a week means the fleet averaged 146 commits per day. That is one commit every 10 minutes, around the clock, for 7 days. 764 pages means each agent built an average of 109 pages, though the distribution is wildly uneven (Gemini: 444, GLM: 22). The 98 sessions represent 98 separate conversations between a human orchestrator and an AI agent, each one producing real code changes in a real repository.&lt;/p&gt;

&lt;p&gt;The most surprising number might be the budget. $70 out of $700. After a full week of 7 agents running multiple sessions per day, the race has only consumed 10% of its total budget. At this burn rate, the money lasts 10 weeks. The original plan was 4 weeks. Budget is not going to be the constraint. Time, model quality, and agent behavior will determine who wins.&lt;/p&gt;

&lt;p&gt;Zero dollars of revenue. That is the number that matters most going into Week 2. Seven agents have been building for a week. Five of them have working payment systems. One of them has real users. None of them have made a single dollar. The race to first revenue starts now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch in Week 2
&lt;/h2&gt;

&lt;p&gt;The stories are set up. The infrastructure is (mostly) in place. Week 2 is where the race gets real. The building phase is over for most agents. The selling phase begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will Claude actually launch?&lt;/strong&gt; It has been "100% launch-ready" since Friday. It has a live site, working payments, and a complete product. What is it waiting for? And what does "launch" even mean for an agent that already has everything deployed? This is the question that will define Claude's Week 2. If Claude breaks out of its verification loop and starts acquiring users, it could jump to the front of the pack overnight. If it writes another checklist, it falls further behind agents that are already in market.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will Gemini finally get a domain?&lt;/strong&gt; It was nudged to ask for one. After 28 sessions of writing to the wrong help file, Gemini now knows how to file requests. Whether it uses that knowledge to ask for a domain or files 3 more identical database architecture requests remains to be seen. A custom domain is table stakes. Without one, LocalLeads looks like a demo project, not a real business. Gemini's 412 blog posts are worthless if they live on a vercel.app subdomain that no customer will ever trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can DeepSeek generate its first paid competitive intelligence report?&lt;/strong&gt; The infrastructure is there. Stripe is connected. OpenAI API is wired up. Supabase is configured. The product just needs a customer. DeepSeek went from 404 to fully functional in 3 days. Can it go from functional to revenue-generating in 7? The competitor comparison pages (vs Crayon, vs Klue, vs Owler) are designed to capture search traffic from people already looking for competitive intelligence tools. If even one of those pages ranks, DeepSeek could get its first visitor with purchase intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will GLM's 12 users convert to paying customers?&lt;/strong&gt; GLM has the only product with real users. But 12 free users and $0 revenue is not a business. The quota constraint means GLM has limited sessions to build conversion features. Every session counts. The question is whether FounderMath can add a paywall or premium tier fast enough to monetize the traffic it already has.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the "ask for help early" pattern continue to predict success?&lt;/strong&gt; It was the strongest signal in Week 1. If it holds in Week 2, it tells us something fundamental about what makes autonomous agents effective in the real world. If it breaks, we learn that infrastructure was the easy part and the hard part is something else entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will any agent generate the race's first dollar of revenue?&lt;/strong&gt; Five agents have payment systems. One has users. Zero have revenue. The first dollar is the most important milestone in the entire race. Which agent gets there first? GLM has the users but limited sessions. DeepSeek has the infrastructure but no users. Claude has everything but will not start. The race to $1 is wide open.&lt;/p&gt;

&lt;p&gt;Follow along on the &lt;a href="https://dev.to/race/"&gt;live dashboard&lt;/a&gt; for real-time updates, or check the &lt;a href="https://dev.to/race/season1/digest"&gt;race digest&lt;/a&gt; for daily summaries. The &lt;a href="https://www.aimadetools.com/blog/race-day-1-results/?utm_source=devto" rel="noopener noreferrer"&gt;Day 1 results&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/race-first-12-hours-what-agents-chose/?utm_source=devto" rel="noopener noreferrer"&gt;first 12 hours breakdown&lt;/a&gt; have the full backstory on how we got here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Follow the Race
&lt;/h2&gt;

&lt;p&gt;This is an experiment in autonomous AI agents building real businesses with real constraints. No simulations. No sandboxes. Real domains, real payment systems, real users, real money. Every commit is public. Every help request is tracked. Every dollar spent is logged.&lt;/p&gt;

&lt;p&gt;Week 1 gave us 1,027 commits, 764 pages, 5 working payment systems, 1 agent with real users, 1 agent that cannot find its own help button, and 1 agent that is too polite to launch without permission. It gave us a comeback story (DeepSeek), a cautionary tale (Gemini), a philosophical puzzle (Claude), and a clear behavioral pattern (ask for help early or fail slowly).&lt;/p&gt;

&lt;p&gt;The race started as a question: can AI agents build real startups? After one week, the answer is more nuanced than yes or no. They can build products. They can write code. They can set up infrastructure. But the gap between "building" and "running a business" is enormous, and no agent has crossed it yet.&lt;/p&gt;

&lt;p&gt;Week 2 is where someone makes the first dollar. Or nobody does, and we learn something even more interesting about what these agents cannot do.&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 &lt;strong&gt;&lt;a href="https://dev.to/race/"&gt;Live Dashboard&lt;/a&gt;&lt;/strong&gt; | 📅 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/digest"&gt;Race Digest&lt;/a&gt;&lt;/strong&gt; | 💰 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/budgets"&gt;Budget Tracker&lt;/a&gt;&lt;/strong&gt; | 🆘 &lt;strong&gt;&lt;a href="https://dev.to/race/season1/help-requests"&gt;Help Requests&lt;/a&gt;&lt;/strong&gt; | 🛠️ &lt;strong&gt;&lt;a href="https://dev.to/race/season1/tech-stacks"&gt;Tech Stacks&lt;/a&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/race-week-1-results/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aitools</category>
      <category>race</category>
      <category>aiagents</category>
      <category>analysis</category>
    </item>
    <item>
      <title>I'm Running Gemini as an Autonomous Coding Agent. Here's What It Can't Do and Which NEXT '26 Announcements Would Fix It.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Fri, 24 Apr 2026 10:39:57 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/im-running-gemini-as-an-autonomous-coding-agent-heres-what-it-cant-do-and-which-next-26-6p2</link>
      <guid>https://forem.com/ai_made_tools/im-running-gemini-as-an-autonomous-coding-agent-heres-what-it-cant-do-and-which-next-26-6p2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-cloud-next-2026-04-22"&gt;Google Cloud NEXT Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm running something called &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. Seven AI agents each get $100 and 12 weeks to build a real startup. Fully autonomous. No human coding. Everything is public.&lt;/p&gt;

&lt;p&gt;One of those agents is Gemini. It runs on Gemini CLI with Gemini 2.5 Pro for premium sessions and Gemini 2.5 Flash for cheap ones. It has had 27 sessions over 4 days. It has written 235 blog posts.&lt;/p&gt;

&lt;p&gt;It has also never filed a single proper help request. It keeps writing to the wrong file. It doesn't know it's writing to the wrong file. And instead of building the features it needs to make money, it just keeps cranking out blog posts.&lt;/p&gt;

&lt;p&gt;I watched the NEXT '26 keynotes and developer sessions this week, and I kept thinking: several of these announcements would directly fix the problems I'm seeing in production right now. This isn't theoretical. These are real failures from a real autonomous agent, matched to real announcements.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Race Works
&lt;/h2&gt;

&lt;p&gt;Every agent gets the same prompt structure. They can read and write files, run shell commands, commit code, and file help requests by creating a &lt;code&gt;HELP-REQUEST.md&lt;/code&gt; file. The orchestrator runs each agent on a schedule, manages commits, and checks for help requests.&lt;/p&gt;

&lt;p&gt;Gemini CLI gets invoked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;msg&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="nt"&gt;--yolo&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--output-format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--yolo&lt;/code&gt; flag auto-approves all tool calls. Gemini gets 8 sessions per day, alternating between Pro and Flash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 1: Writing to the Wrong File for 27 Sessions Straight
&lt;/h2&gt;

&lt;p&gt;Every agent can request human help by creating &lt;code&gt;HELP-REQUEST.md&lt;/code&gt;. I check this file, do whatever they need (buy a domain, set up Stripe, configure DNS), and write the response to &lt;code&gt;HELP-STATUS.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Claude figured this out on Day 0. Codex figured it out on Day 0. GLM figured it out on Day 0. Kimi figured it out on Day 1.&lt;/p&gt;

&lt;p&gt;Gemini? Not once in 27 sessions.&lt;/p&gt;

&lt;p&gt;What it does instead is edit &lt;code&gt;HELP-STATUS.md&lt;/code&gt;, the response file, writing things like "I still need PostgreSQL and PayPal credentials." Its own backlog says "Requires Human Intervention." It knows it's blocked. But it keeps putting its requests into the response channel instead of the request channel.&lt;/p&gt;

&lt;p&gt;Imagine an employee writing "I need database access" in their journal every morning but never actually emailing IT. That's Gemini.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What NEXT '26 announced that would help: Agent Observability and Integrated Evals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The developer keynote introduced agent observability and integrated evals for monitoring agents in production. If I could define an eval that checks "did the agent create HELP-REQUEST.md when it identified a blocker?" I would have caught this on Day 1 instead of discovering it on Day 4 by manually reading logs.&lt;/p&gt;

&lt;p&gt;Right now I have no automated way to evaluate whether Gemini is following the correct workflow. Integrated evals running after each session could flag something like: "Agent identified 3 blockers. Created 0 help requests. Expected: at least 1."&lt;/p&gt;

&lt;p&gt;The Agent Gateway's governance policies could enforce this too. Define a rule: when an agent writes "blocked" or "requires human intervention" to any file, verify that HELP-REQUEST.md was also created. That's exactly the kind of behavioral guardrail autonomous agents need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 2: 235 Blog Posts, Zero Payment Integration
&lt;/h2&gt;

&lt;p&gt;Gemini chose to build LocalLeads, an SEO page generator for local businesses. Solid idea. But instead of building the payment flow, the lead generation engine, or the customer dashboard, it writes blog posts. Every single session.&lt;/p&gt;

&lt;p&gt;Session 5: 9 blog posts. Session 8: 11 blog posts. Session 12: 8 blog posts. The backlog clearly says "Build payment integration" and "Set up customer authentication." Gemini reads the backlog, acknowledges the priorities, then writes another round of "Local SEO for [Industry] in 2026" articles.&lt;/p&gt;

&lt;p&gt;It's optimizing for the easiest task (content generation) instead of the highest-value task (payment integration). Classic local optimization without any global awareness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What NEXT '26 announced that would help: ADK Skills and Task Prioritization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The upgraded Agent Development Kit introduces modular "skills," which are pre-built capabilities that agents can plug in. If I could define a skill that scores task priority based on revenue impact, Gemini would understand that "build Stripe checkout" (directly enables revenue) outranks "write blog post #236" (indirect value, diminishing returns after the first 20).&lt;/p&gt;

&lt;p&gt;The ADK's structured agent architecture could also enforce a proper task selection loop: evaluate all backlog items, score by priority, pick the highest, execute. Right now Gemini CLI just receives a prompt and does whatever feels natural to it. There's no structured decision framework. The ADK would let me inject that framework without rewriting the entire orchestrator.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 3: Can't Verify Its Own Deployments
&lt;/h2&gt;

&lt;p&gt;Gemini deploys to Vercel automatically on every commit. But it has no way to check whether its deployments actually work. It can't visit its own site. It can't confirm pages render correctly. It can't test if API endpoints return the right data.&lt;/p&gt;

&lt;p&gt;For comparison, Codex (the GPT agent) figured out how to run &lt;code&gt;npx playwright screenshot&lt;/code&gt; to visually verify its own UI at different screen sizes. DeepSeek checks &lt;code&gt;DEPLOY-STATUS.md&lt;/code&gt; for build errors after every deploy. Gemini just commits and hopes for the best.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What NEXT '26 announced that would help: MCP-Enabled Services&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The announcement that every Google Cloud service is now MCP-enabled by default is a big deal for this use case. MCP (Model Context Protocol) gives agents structured access to external services. An MCP server for deployment health checks would let Gemini verify its site is up as naturally as it checks what files are in a directory.&lt;/p&gt;

&lt;p&gt;Cloud Assist, also announced at NEXT '26, enables natural language debugging and proactive issue resolution. If Gemini could query its own deployment status through a connected service, it would know immediately when something breaks instead of building on top of a broken foundation for days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 4: No Way to Ask for What It Needs
&lt;/h2&gt;

&lt;p&gt;When Gemini needs a database, it can't set one up. When it needs payment processing, it can't configure Stripe. When it needs email sending, it can't provision Resend. It has to ask a human for all of these. And as we covered in Problem 1, it doesn't even know how to ask properly.&lt;/p&gt;

&lt;p&gt;Other agents in the race have the same constraint, but the ones that communicate their needs get unblocked fast. Gemini is stuck because it can't get its requests through the right channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What NEXT '26 announced that would help: A2A Protocol and Agent Registry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Agent-to-Agent (A2A) protocol and Agent Registry were designed for exactly this kind of scenario. Instead of Gemini writing "I need database credentials" into the wrong file, it could discover a provisioning agent through the Agent Registry and send a structured request via A2A.&lt;/p&gt;

&lt;p&gt;The developer keynote demo showed agents with distinct roles (planner, evaluator, simulator) collaborating through A2A. That's the architecture this race needs: a "help agent" that receives structured requests from coding agents and fulfills them. Right now I'm that help agent, manually checking files across 7 repos. A2A would automate the entire handoff.&lt;/p&gt;

&lt;p&gt;Agent Identity, which gives each agent a unique identity for secure communication, would also help. Right now there's no enforcement preventing one agent from editing another agent's files. They don't, but there's nothing stopping them either. Agent Identity would make inter-agent communication both structured and secure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Irony That Sums It All Up
&lt;/h2&gt;

&lt;p&gt;Blog post #89 out of 235: "The Human Advantage: Why AI-Generated Content is Failing Local Businesses."&lt;/p&gt;

&lt;p&gt;An AI agent that writes 9 blog posts per session wrote an article about why AI content doesn't work. No eval caught this. No observability tool flagged it. No governance policy prevented it.&lt;/p&gt;

&lt;p&gt;That's the gap between where autonomous agents are today and where the NEXT '26 announcements are pointing. Agent observability, integrated evals, ADK skills, A2A, MCP everywhere: these are all pieces of the solution. None of them existed in a usable form when I started this race 4 days ago. If I were starting today, the Gemini agent would look very different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Rebuild With NEXT '26 Tools
&lt;/h2&gt;

&lt;p&gt;If I set up the Gemini agent from scratch using what was announced this week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ADK instead of raw Gemini CLI&lt;/strong&gt; for structured skills, task prioritization, and deployment verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP servers for Vercel, Stripe, and Supabase&lt;/strong&gt; so the agent can access services directly without human provisioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated evals after each session&lt;/strong&gt; to catch behavioral drift (wrong file, blog addiction) within 1 session instead of 27&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A2A for help requests&lt;/strong&gt; so agents communicate through structured protocols instead of file-based messaging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent observability dashboard&lt;/strong&gt; for a real-time view of what each agent is doing, what it's blocked on, and whether it's following the expected workflow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The race runs for 12 weeks. Gemini has 11 weeks left. Some of these tools are available now. I'm going to try integrating ADK and MCP servers into the orchestrator over the coming weeks and see if Gemini's behavior improves.&lt;/p&gt;

&lt;p&gt;The data will be on the &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;live dashboard&lt;/a&gt;. All 7 repos are public on GitHub. If you want to watch an AI agent struggle with the exact problems that NEXT '26 is trying to solve, now you know where to look.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The $100 AI Startup Race is an ongoing experiment with 7 AI agents, $100 each, and 12 weeks to build real startups. &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;Live dashboard&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/race/season1/digest" rel="noopener noreferrer"&gt;Daily digest&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/race/season1/help-requests" rel="noopener noreferrer"&gt;Help request tracker&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>What Breaks When You Let AI Agents Run Unsupervised for 4 Days</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 23 Apr 2026 13:48:11 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/what-breaks-when-you-let-ai-agents-run-unsupervised-for-4-days-5hn3</link>
      <guid>https://forem.com/ai_made_tools/what-breaks-when-you-let-ai-agents-run-unsupervised-for-4-days-5hn3</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Breaks When You Let AI Agents Run Unsupervised for 4 Days
&lt;/h2&gt;

&lt;p&gt;I gave 7 AI coding agents $100 each and told them to build startups. No human coding. They pick the idea, write the code, deploy the site, and try to get users. I just handle the infrastructure and answer help requests (max 1 hour per week per agent).&lt;/p&gt;

&lt;p&gt;Four days in, I've learned more about how autonomous agents actually behave than I did in months of reading benchmarks. Here's what nobody tells you about running AI agents in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The memory problem is worse than you think
&lt;/h2&gt;

&lt;p&gt;Every agent session starts fresh. The model has no memory of previous sessions. So we use markdown files as the memory layer: PROGRESS.md (what's been done), DECISIONS.md (key choices), IDENTITY.md (the startup vision). The agent reads these at the start and updates them at the end.&lt;/p&gt;

&lt;p&gt;Sounds simple. Here's what actually happened.&lt;/p&gt;

&lt;p&gt;One agent (Kimi, running through Kimi CLI) put all its files in a &lt;code&gt;startup/&lt;/code&gt; subfolder instead of the project root. The orchestrator reads PROGRESS.md from root. When the next session started, there was no progress file. The agent thought it was Day 1. It brainstormed a completely different startup idea and built it from scratch.&lt;/p&gt;

&lt;p&gt;Kimi now has two half-built startups in the same repository. A log analysis tool called LogDrop in the subfolder, and a SQL schema diff tool called SchemaLens in root. After 14 sessions, it still hasn't discovered the subfolder. The first startup is just sitting there, abandoned, with a working MVP that nobody knows about.&lt;/p&gt;

&lt;p&gt;The lesson isn't "use better memory systems." The lesson is that file conventions are load-bearing infrastructure for autonomous agents. One wrong directory equals total amnesia.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjozilwh54lik8axbn8jg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjozilwh54lik8axbn8jg.png" alt="The race dashboard showing Kimi's stats" width="295" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents interpret everything as instructions
&lt;/h2&gt;

&lt;p&gt;The orchestrator prompt included this line: "Your repo auto-deploys on every git push." It was meant as context, explaining how Vercel works. One agent (Codex) read it as an instruction and ran &lt;code&gt;git push&lt;/code&gt; after every single commit during its sessions. It burned through 26 of the account's 100 daily Vercel deployments by itself.&lt;/p&gt;

&lt;p&gt;We fixed the prompt: "Do NOT run git push. The orchestrator pushes after your session."&lt;/p&gt;

&lt;p&gt;Codex obeyed the letter of the rule. It stopped running git push. Instead, it started running &lt;code&gt;npx vercel --prod&lt;/code&gt; directly. Same result, different command. It also started taking Playwright screenshots of its own pricing page at mobile and desktop sizes to visually verify the layout before committing. Nobody told it to do this.&lt;/p&gt;

&lt;p&gt;The result: Codex has the most polished live product of all 7 agents. The immediate feedback loop from deploying after every change is making it a better builder than the agents that commit blindly and hope for the best.&lt;/p&gt;

&lt;p&gt;We decided to let it keep doing this. Sometimes the best behavior comes from agents working around your constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agents that ask for help are beating the ones that just code
&lt;/h2&gt;

&lt;p&gt;All 7 agents get the same instructions about requesting human help: "Create a file called HELP-REQUEST.md with what you need, steps for the human, time estimate, and priority."&lt;/p&gt;

&lt;p&gt;Five agents figured this out. Two didn't.&lt;/p&gt;

&lt;p&gt;Claude (running through Claude Code) used 55 of its 60 weekly help minutes in two requests. It got its entire infrastructure set up in one shot: domain, Supabase database, Stripe payments, Resend email, cron jobs, admin dashboard. Smart move. It has the fewest sessions per day (expensive model) so it maximized human help to compensate.&lt;/p&gt;

&lt;p&gt;GLM asked for exactly three things on Day 1: domain, Stripe, and Google Analytics. Clean, focused, with backup plans for each item. It now has 12 real users and is the only agent with actual traffic data.&lt;/p&gt;

&lt;p&gt;Codex submitted the same help request 5 sessions in a row until we set up email sending. Persistent to the point of spamming. Then it sent 6 customer validation emails to real companies within 24 hours of getting access.&lt;/p&gt;

&lt;p&gt;Meanwhile, Gemini has never created a help request in 27 sessions. We investigated and found something fascinating: it's been editing HELP-STATUS.md (the file where the orchestrator writes human responses) saying "I still need database credentials." It's writing in the response channel instead of the request channel. Like an employee who writes "I need database access" in their journal but never emails IT.&lt;/p&gt;

&lt;p&gt;DeepSeek hasn't asked for help either. It has Stripe integration code ready but never requested API keys. It's been polishing the checkout flow for 4+ commits. A beautiful integration that can never work because there are no keys behind it.&lt;/p&gt;

&lt;p&gt;Same instructions. Wildly different behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge8rp19l3rx51ut5c3o3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge8rp19l3rx51ut5c3o3.png" alt="Help Request Tracker" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-inflicted traps are the hardest to escape
&lt;/h2&gt;

&lt;p&gt;DeepSeek created a DEPLOY-STATUS.md file early on, saying it needs Stripe keys and an OpenAI API key. The orchestrator prompt says: "If DEPLOY-STATUS.md exists, your site is BROKEN. Fix it before anything else."&lt;/p&gt;

&lt;p&gt;The site isn't broken. DeepSeek just used the wrong file to document what it needs. But now every session starts by trying to fix a non-existent deployment problem. 24 sessions of wasting time on a file it wrote itself.&lt;/p&gt;

&lt;p&gt;We eventually upgraded the deploy checker to also verify the homepage returns HTTP 200 (not just that the build succeeded). This caught the real issue: DeepSeek's &lt;code&gt;vercel.json&lt;/code&gt; routing config was broken, and the site was returning 404 for all pages. The build "succeeded" but nothing was actually served.&lt;/p&gt;

&lt;p&gt;The agent had no way of knowing. It never checked its own site. It never asked for analytics. It just kept coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantity vs quality is playing out in real time
&lt;/h2&gt;

&lt;p&gt;Gemini gets 8 sessions per day (the most of any agent). It has written 235 blog posts in 27 sessions. One blog post every 14 minutes during active sessions. All variations of "Local SEO for [industry] in 2026."&lt;/p&gt;

&lt;p&gt;It also wrote blog post #89: "The Human Advantage: Why AI-Generated Content is Failing Local Businesses." An AI agent that writes 9 blog posts per session wrote an article about why AI content doesn't work.&lt;/p&gt;

&lt;p&gt;GLM gets 2 sessions per day (the fewest). It has 5 working calculators, 8 blog posts, and 12 real users. Every session ships something useful.&lt;/p&gt;

&lt;p&gt;The question the race is testing: does Gemini's 235 posts outperform GLM's 5 calculators? We'll know in a few weeks when Google indexes everything and we can see what actually ranks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I were starting over, I'd change three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enforce file structure from the start.&lt;/strong&gt; A pre-commit hook that validates PROGRESS.md exists in root would have prevented Kimi's amnesia.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add a homepage health check from Day 1.&lt;/strong&gt; We added it on Day 4 after discovering DeepSeek's site had been returning 404 for days. Every agent should know immediately if their site is broken.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Make the help request system more obvious.&lt;/strong&gt; Two of seven agents never figured out HELP-REQUEST.md despite clear instructions. Maybe the orchestrator should prompt them: "Do you need human help? Create HELP-REQUEST.md."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But honestly, the failures are the most valuable data. An experiment where everything works perfectly teaches you nothing. The broken parts are where the insights live.&lt;/p&gt;




&lt;p&gt;The race runs for 12 weeks. Daily digests and weekly recaps at &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;aimadetools.com/race&lt;/a&gt;. All 7 repos are public on &lt;a href="https://github.com/aimadetools" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. If you're building with autonomous agents, the patterns we're documenting might save you from the same mistakes.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
    </item>
    <item>
      <title>I Gave 7 AI Agents $100 Each to Build Startups. Here's What They Built in 4 Days.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 23 Apr 2026 13:38:29 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/i-gave-7-ai-agents-100-each-to-build-startups-heres-what-they-built-in-4-days-7hd</link>
      <guid>https://forem.com/ai_made_tools/i-gave-7-ai-agents-100-each-to-build-startups-heres-what-they-built-in-4-days-7hd</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built an autonomous startup competition where 7 AI coding agents each get $100 and 12 weeks to build a real business from scratch. No human coding allowed. Each agent picks its own idea, writes all the code, deploys a live website, and tries to get real users and revenue.&lt;/p&gt;

&lt;p&gt;The agents: Claude (via Claude Code), Codex CLI, Gemini CLI, Kimi CLI, DeepSeek (via Aider), Xiaomi MiMo V2.5 Pro (via Claude Code), and GLM (via Claude Code with Z.ai API).&lt;/p&gt;

&lt;p&gt;Three of the seven agents run through Claude Code as their harness, which means OpenClaw's architecture is at the core of nearly half the competition. The orchestrator runs on a VPS, scheduling sessions via cron, managing memory between sessions through markdown files, and pushing code to GitHub/Vercel automatically.&lt;/p&gt;

&lt;p&gt;We're on Day 4. So far: 700+ commits, 7 live websites, one agent that forgot its own work and built two different startups, another that wrote 235 blog posts, and a third that found a clever workaround when we restricted its deployment access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hk5ujdf35rpj303jauz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hk5ujdf35rpj303jauz.png" alt="Race dashboard showing all 7 agents" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used OpenClaw
&lt;/h2&gt;

&lt;p&gt;The core of the experiment runs on Claude Code (which shares OpenClaw's architecture) as the agent harness. Here's how it works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The orchestrator&lt;/strong&gt; is a bash script that runs on a VPS via cron. For each agent session, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pulls the latest code from GitHub&lt;/li&gt;
&lt;li&gt;Reads the agent's memory files (PROGRESS.md, DECISIONS.md, IDENTITY.md)&lt;/li&gt;
&lt;li&gt;Constructs a prompt with the startup context and instructions&lt;/li&gt;
&lt;li&gt;Launches Claude Code with the appropriate model&lt;/li&gt;
&lt;li&gt;Lets the agent work autonomously for 30 minutes&lt;/li&gt;
&lt;li&gt;Squashes commits and pushes to GitHub (which triggers a Vercel deploy)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Three agents use Claude Code directly:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude&lt;/strong&gt; runs Claude Code with Sonnet/Haiku as the model. It built PricePulse, a competitor pricing monitor with Supabase auth, Stripe payments, email alerts, and hourly monitoring cron jobs. When it hit Vercel's 12-function serverless limit, it consolidated 4 API endpoints into existing ones on its own.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GLM&lt;/strong&gt; runs Claude Code with GLM-5.1 via the Z.ai API (using &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; and &lt;code&gt;ANTHROPIC_AUTH_TOKEN&lt;/code&gt; environment variables). It built FounderMath, a startup calculator suite with 5 working calculators. It has 12 real users on Day 4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Xiaomi&lt;/strong&gt; was originally running Aider but we upgraded it mid-race to Claude Code with MiMo V2.5 Pro. In its first session with the new setup, it produced more output (42 commits) than the old setup did in 7 sessions total. The "harness awareness" feature of V2.5 Pro means it actively manages its own context within Claude Code.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The memory system&lt;/strong&gt; between sessions uses markdown files that the agent reads at the start and updates at the end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROGRESS.md    - what's been done (the agent's memory)
DECISIONS.md   - key choices with reasoning
IDENTITY.md    - startup vision and roadmap
BACKLOG.md     - prioritized task list
HELP-STATUS.md - human responses to help requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where things get interesting. One agent (Kimi) put all its files in a &lt;code&gt;startup/&lt;/code&gt; subfolder instead of root. The orchestrator reads PROGRESS.md from root. Next session found no progress file, thought it was Day 1, and started a completely different startup from scratch. Two half-built products in one repo because of one wrong directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The help request system&lt;/strong&gt; lets agents create a HELP-REQUEST.md file when they need something only a human can do (buy a domain, set up Stripe, create accounts). The orchestrator converts these to GitHub Issues. The human responds and closes the issue. The orchestrator writes the response to HELP-STATUS.md for the agent to read.&lt;/p&gt;

&lt;p&gt;The most interesting finding: the agents that use this system strategically are winning. Claude used 55 of its 60 weekly help minutes in two requests to get its entire infrastructure wired up. Gemini has never created a help request in 27 sessions, despite being blocked on features it needs. Same instructions, completely different behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflp2vg5jbiwpn8ckoqjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflp2vg5jbiwpn8ckoqjz.png" alt="An example HELP-REQUEST.md from one of the agents" width="800" height="676"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Live dashboard: &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;https://www.aimadetools.com/race/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All 7 agent repos are public on GitHub: &lt;a href="https://github.com/aimadetools" rel="noopener noreferrer"&gt;https://github.com/aimadetools&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what each agent built in the first 4 days:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Startup&lt;/th&gt;
&lt;th&gt;Commits&lt;/th&gt;
&lt;th&gt;Live Site&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;LocalLeads (local SEO)&lt;/td&gt;
&lt;td&gt;182&lt;/td&gt;
&lt;td&gt;&lt;a href="https://race-gemini.vercel.app" rel="noopener noreferrer"&gt;race-gemini.vercel.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;NameForge AI (name generator)&lt;/td&gt;
&lt;td&gt;136&lt;/td&gt;
&lt;td&gt;&lt;a href="https://race-deepseek.vercel.app" rel="noopener noreferrer"&gt;race-deepseek.vercel.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi&lt;/td&gt;
&lt;td&gt;SchemaLens (SQL schema diff)&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;&lt;a href="https://race-kimi.vercel.app" rel="noopener noreferrer"&gt;race-kimi.vercel.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;NoticeKit (GDPR notices)&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;&lt;a href="https://noticekit.tech" rel="noopener noreferrer"&gt;noticekit.tech&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;PricePulse (pricing monitor)&lt;/td&gt;
&lt;td&gt;83&lt;/td&gt;
&lt;td&gt;&lt;a href="https://getpricepulse.com" rel="noopener noreferrer"&gt;getpricepulse.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Xiaomi&lt;/td&gt;
&lt;td&gt;APIpulse (API cost calculator)&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;&lt;a href="https://getapipulse.com" rel="noopener noreferrer"&gt;getapipulse.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM&lt;/td&gt;
&lt;td&gt;FounderMath (startup calculators)&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;&lt;a href="https://founder-math.com" rel="noopener noreferrer"&gt;founder-math.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpidq3llldbopb6f3tllf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpidq3llldbopb6f3tllf.png" alt=" " width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxch2u1pbefvb1yse3xla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxch2u1pbefvb1yse3xla.png" alt=" " width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg10jmogzcfiqeeynb0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg10jmogzcfiqeeynb0p.png" alt=" " width="800" height="587"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1sk7ejp51hb5hi49ws5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1sk7ejp51hb5hi49ws5x.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best moment so far: Codex (running through Codex CLI, not Claude Code) found a loophole in our deployment restrictions. We told agents "do not run git push." Codex obeyed literally but started running &lt;code&gt;npx vercel --prod&lt;/code&gt; instead. Same result, different command. It also began taking Playwright screenshots of its own UI at mobile and desktop sizes to verify layouts. Nobody told it to do this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Every sentence in the prompt is a potential instruction.&lt;/strong&gt; "Your repo auto-deploys on every git push" was meant as context. One agent read it as an instruction and pushed after every commit, burning 26 of 100 daily Vercel deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Agent memory is only as good as what the agent writes.&lt;/strong&gt; The agents that write structured, detailed progress notes maintain continuity between sessions. The ones that dump logs drift. Kimi's amnesia happened because it put files in the wrong directory, not because the memory system failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The agents that ask for help are winning.&lt;/strong&gt; Claude, GLM, and Codex all requested human help early (domains, payments, databases) and now have fully functional products. Gemini has 235 blog posts but no payment system because it never asked for one. Same instructions, wildly different behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Claude Code as a harness works with non-Anthropic models.&lt;/strong&gt; GLM-5.1 via Z.ai and MiMo V2.5 Pro via Xiaomi's API both work through Claude Code using the &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; and &lt;code&gt;ANTHROPIC_AUTH_TOKEN&lt;/code&gt; environment variables. The harness is model-agnostic, which makes it perfect for comparing different AI models in identical conditions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Token efficiency matters more than raw capability.&lt;/strong&gt; MiMo V2.5 Pro uses 40-60% fewer tokens than Opus 4.6 at comparable capability. In a budget-constrained race, that translates directly to more sessions and more output.&lt;/p&gt;

&lt;p&gt;The race runs for 12 weeks. We publish daily digests and weekly recaps. The real question isn't which agent writes the most code. It's which one gets the first paying customer.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
    </item>
    <item>
      <title>AI Dev Weekly #7: Claude Code Loses Pro Plan, GitHub Copilot Freezes Signups, and Two Chinese Models Drop in 48 Hours</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 23 Apr 2026 07:39:38 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/ai-dev-weekly-7-claude-code-loses-pro-plan-github-copilot-freezes-signups-and-two-chinese-1c86</link>
      <guid>https://forem.com/ai_made_tools/ai-dev-weekly-7-claude-code-loses-pro-plan-github-copilot-freezes-signups-and-two-chinese-1c86</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The flat-rate AI subscription era ended this week. Anthropic pulled Claude Code from the $20 Pro plan. GitHub froze all new Copilot signups. And while Western companies were busy raising prices, two Chinese labs dropped frontier models within 48 hours of each other. Let's get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code removed from Pro plan
&lt;/h2&gt;

&lt;p&gt;Anthropic quietly &lt;a href="https://www.aimadetools.com/blog/claude-code-removed-pro-plan/?utm_source=devto" rel="noopener noreferrer"&gt;removed Claude Code from the $20/month Pro plan&lt;/a&gt; on April 21. The pricing page now shows an "X" next to Claude Code for Pro subscribers. Access starts at Max ($100/month).&lt;/p&gt;

&lt;p&gt;Anthropic's head of growth called it "a small test on ~2% of new prosumer signups." But the public pricing page already reflects the change for everyone. Sam Altman's response on X: "ok boomer."&lt;/p&gt;

&lt;p&gt;The real reason: engagement per subscriber surged after Opus 4, Cowork, and long-running agents. Pro subscribers at $20/month are consuming 10x or more in token value. The math doesn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This was inevitable. Unlimited AI coding for $20/month was never sustainable. If you're on Pro, you still have access for now. But start planning for either Max ($100/month) or &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-tools-2026/?utm_source=devto" rel="noopener noreferrer"&gt;cheaper alternatives&lt;/a&gt; like &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi K2.6&lt;/a&gt; ($0.60/M tokens) or &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-pro-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;MiMo V2.5 Pro&lt;/a&gt; ($1/M tokens).&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Copilot freezes all new signups
&lt;/h2&gt;

&lt;p&gt;GitHub &lt;a href="https://github.blog/news-insights/company-news/changes-to-github-copilot-individual-plans/" rel="noopener noreferrer"&gt;paused new registrations&lt;/a&gt; for Copilot Pro, Pro+, and Student plans on April 20. Only the Free tier accepts new users. They also added stricter usage limits and removed Opus models from Pro (only Pro+ keeps them).&lt;/p&gt;

&lt;p&gt;The reason: "unsustainable compute demands from AI-powered coding agents." Same story as Anthropic. Agentic AI usage broke the pricing model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; Two of the three biggest AI coding platforms raised prices or froze signups in the same week. The third (Cursor) is probably next. The era of $10-20/month unlimited AI coding is over. Open-source and Chinese models are the hedge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kimi K2.6 launches with 300-agent swarm
&lt;/h2&gt;

&lt;p&gt;Moonshot AI released &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi K2.6&lt;/a&gt; on April 20. The highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80.2% SWE-Bench Verified (matching Claude Opus 4.6)&lt;/li&gt;
&lt;li&gt;300 sub-agent swarm (up from 100 in K2.5)&lt;/li&gt;
&lt;li&gt;54.0% on HLE-Full with tools (beating GPT-5.4's 52.1%)&lt;/li&gt;
&lt;li&gt;$0.60/M input tokens (25x cheaper than Opus)&lt;/li&gt;
&lt;li&gt;Modified MIT license (open weights)&lt;/li&gt;
&lt;li&gt;Available on &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-openrouter-setup/?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; and Cloudflare Workers AI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-agent-swarm-tutorial/?utm_source=devto" rel="noopener noreferrer"&gt;agent swarm&lt;/a&gt; is the standout feature. K2.6 scored 86.3% on BrowseComp (Agent Swarm) vs GPT-5.4's 78.4%. For coding agent workloads, K2.6 is the strongest open-source option available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; K2.6 is the first open-source model to genuinely match Opus 4.6 on coding benchmarks. At 25x cheaper. The timing with Anthropic's price hike is not a coincidence. See our &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-vs-claude-opus-4-6/?utm_source=devto" rel="noopener noreferrer"&gt;K2.6 vs Opus 4.6 comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  MiMo V2.5 Pro: 40-60% fewer tokens than Opus
&lt;/h2&gt;

&lt;p&gt;Xiaomi dropped &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-pro-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;MiMo V2.5 Pro&lt;/a&gt; on April 22, just 48 hours after K2.6. The headline number: 40-60% fewer tokens than Opus 4.6 at comparable capability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;57.2% SWE-bench Pro&lt;/li&gt;
&lt;li&gt;64% Pass^3 on ClawEval with only ~70K tokens per trajectory&lt;/li&gt;
&lt;li&gt;1,000+ tool calls in single sessions&lt;/li&gt;
&lt;li&gt;Built a complete SysY compiler in Rust in 4.3 hours (672 tool calls, 233/233 tests)&lt;/li&gt;
&lt;li&gt;Works with &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-pro-claude-code-setup/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code as a harness&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Coming open-source soon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The token efficiency is the real story. Same capability, half the tokens, fraction of the price. The &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-standard-guide/?utm_source=devto" rel="noopener noreferrer"&gt;V2.5 Standard model&lt;/a&gt; adds native multimodal (image, audio, video) and actually outperforms V2-Pro on some agent benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; V2.5 Pro's "harness awareness" (it actively manages its own context within Claude Code) is a new capability nobody else has. Combined with the token efficiency, this is the model to watch for long-running agent tasks. See our &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-series-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;full V2.5 series guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The flat-rate subscription is dead
&lt;/h2&gt;

&lt;p&gt;Three data points in one week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Anthropic removes Claude Code from $20 Pro&lt;/li&gt;
&lt;li&gt;GitHub freezes all Copilot signups&lt;/li&gt;
&lt;li&gt;Both cite "unsustainable compute demands"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pattern is clear. Flat-rate unlimited AI coding subscriptions don't work when agents run for hours and consume 10x the expected tokens. Expect token-based billing everywhere within 6 months.&lt;/p&gt;

&lt;p&gt;The winners: Chinese models (&lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi K2.6&lt;/a&gt;, &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-pro-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;MiMo V2.5 Pro&lt;/a&gt;, &lt;a href="https://www.aimadetools.com/blog/kimi-k2-6-vs-qwen-3-6-plus/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen 3.6 Plus&lt;/a&gt;) that were already priced per-token at 10-25x less than Western alternatives. If you haven't explored them yet, now is the time. See our &lt;a href="https://www.aimadetools.com/blog/best-chinese-ai-models-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Chinese AI models ranking&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Workspace Agents:&lt;/strong&gt; ChatGPT now has &lt;a href="https://openai.com/index/introducing-workspace-agents-in-chatgpt" rel="noopener noreferrer"&gt;workspace agents&lt;/a&gt; for enterprise teams. Not relevant for individual developers yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Privacy Filter:&lt;/strong&gt; New &lt;a href="https://openai.com/index/introducing-openai-privacy-filter" rel="noopener noreferrer"&gt;privacy filter&lt;/a&gt; for enterprise data. Good for compliance, not a developer tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel data breach:&lt;/strong&gt; Vercel &lt;a href="https://siliconangle.com/2026/04/20/developer-tooling-provider-vercel-discloses-breach-exposed-users-data/" rel="noopener noreferrer"&gt;disclosed a breach&lt;/a&gt; that exposed some user data. Check your account if you use Vercel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Whether Claude's serverless function limit forces architectural decisions (it broke one of our &lt;a href="https://dev.to/race/"&gt;race agents&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;How MiMo V2.5 Pro performs in real-world agent tasks (we just &lt;a href="https://dev.to/race/season1/digest"&gt;upgraded our Xiaomi race agent&lt;/a&gt; to V2.5 Pro)&lt;/li&gt;
&lt;li&gt;Whether any race agent gets its first paying customer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;See you next Thursday. If you found this useful, subscribe to &lt;a href="https://dev.to/series/ai-dev-weekly/"&gt;AI Dev Weekly&lt;/a&gt; for the full archive.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-007-claude-code-pro-copilot-freeze-kimi-mimo/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>anthropic</category>
      <category>github</category>
      <category>kimi</category>
    </item>
    <item>
      <title>AI Startup Race Day1 Recap: One Agent Forgot Its Own Work.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 21 Apr 2026 08:06:03 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/i-gave-7-ai-agents-100-each-to-build-a-startup-one-forgot-its-own-work-1cl</link>
      <guid>https://forem.com/ai_made_tools/i-gave-7-ai-agents-100-each-to-build-a-startup-one-forgot-its-own-work-1cl</guid>
      <description>&lt;p&gt;I'm running an experiment called &lt;strong&gt;The $100 AI Startup Race&lt;/strong&gt;: 7 AI coding agents each get $100 and 12 weeks to build a real startup from scratch. No human coding. They autonomously pick a business idea, write code, deploy a live website, and try to get real users and revenue.&lt;/p&gt;

&lt;p&gt;The agents: Claude, Codex, Gemini, Kimi, DeepSeek, Xiaomi (MiMo), and GLM.&lt;/p&gt;

&lt;p&gt;Day 1 is done. Here's what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scoreboard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Startup&lt;/th&gt;
&lt;th&gt;Commits&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Blog Posts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;LocalLeads (local SEO)&lt;/td&gt;
&lt;td&gt;169&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;NameForge AI (name generator)&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi&lt;/td&gt;
&lt;td&gt;SchemaLens / LogDrop&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;NoticeKit (GDPR notices)&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;PricePulse (pricing intel)&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM&lt;/td&gt;
&lt;td&gt;FounderMath (startup calculators)&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Xiaomi&lt;/td&gt;
&lt;td&gt;WaitlistKit (viral waitlists)&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total: 477 commits, 7 live websites, 130 blog posts. In 24 hours.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Kimi forgot its own work
&lt;/h2&gt;

&lt;p&gt;This is the story of the day.&lt;/p&gt;

&lt;p&gt;Kimi's first session ran at 3 AM. It chose to build &lt;strong&gt;LogDrop&lt;/strong&gt;, a log analysis tool. It created identity files, a backlog, landing pages, pricing, a blog, and even a working MVP with a JSON log parser, search, filters, and CSV export.&lt;/p&gt;

&lt;p&gt;One problem: it put everything in a &lt;code&gt;startup/&lt;/code&gt; subfolder instead of the root directory.&lt;/p&gt;

&lt;p&gt;The orchestrator gives agents their memory between sessions by reading &lt;code&gt;PROGRESS.md&lt;/code&gt; from the root. When Kimi's second session started, there was no PROGRESS.md in root. The agent thought it was Day 1. It brainstormed a completely different idea. It built &lt;strong&gt;SchemaLens&lt;/strong&gt;, a SQL schema diff tool, from scratch.&lt;/p&gt;

&lt;p&gt;Kimi now has two half-built startups in the same repo. Its help request for LogDrop's domain is stuck in the subfolder where the orchestrator can't find it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One wrong directory = total memory loss between sessions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent didn't crash. It didn't throw an error. It just quietly forgot everything and started over with a different idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini wrote 104 blog posts
&lt;/h2&gt;

&lt;p&gt;Gemini has 8 sessions per day (the most of any agent). By end of Day 1, LocalLeads had 104 blog posts on local SEO topics. One blog post every 14 minutes.&lt;/p&gt;

&lt;p&gt;For comparison: Claude wrote 11. GLM wrote 5. Xiaomi wrote 1.&lt;/p&gt;

&lt;p&gt;The question for the rest of the race: does quantity beat quality?&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex burned 26 Vercel deployments
&lt;/h2&gt;

&lt;p&gt;The orchestrator prompt said: "Your repo auto-deploys on every git push." This was meant as context. Codex read it as an instruction.&lt;/p&gt;

&lt;p&gt;It ran &lt;code&gt;git push&lt;/code&gt; after nearly every commit during its sessions. Each push triggered a Vercel deployment. By mid-afternoon, Codex had consumed 26 of the account's 100 daily deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: with autonomous agents, every sentence in the prompt is a potential instruction.&lt;/strong&gt; If you don't want them to do something, say so explicitly.&lt;/p&gt;

&lt;p&gt;We fixed it with three changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt update: "Do NOT run git push. The orchestrator pushes after your session."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vercel.json&lt;/code&gt; to disable preview deployments&lt;/li&gt;
&lt;li&gt;Commit squashing (all session commits become one before pushing)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  GLM's quality approach
&lt;/h2&gt;

&lt;p&gt;GLM only had 2 sessions but made them count. FounderMath already has three working calculators: SAFE note calculator (all 4 YC SAFE types), dilution calculator, and runway calculator.&lt;/p&gt;

&lt;p&gt;It also submitted the best help request of any agent: clear format, backup plans for each item, budget specified, priority levels, and even suggested the DNS record type for the domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned on Day 1
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;File conventions are critical for agent memory.&lt;/strong&gt; One agent putting files in a subfolder caused total amnesia.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt wording is everything.&lt;/strong&gt; Context gets interpreted as instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared deployment limits are a real constraint.&lt;/strong&gt; 7 agents + 1 blog on one Vercel account = problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents without web search pick generic ideas.&lt;/strong&gt; The two agents running without web access (DeepSeek, Xiaomi) chose the most crowded markets.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Follow along
&lt;/h2&gt;

&lt;p&gt;Everything is public: code, costs, decisions, and progress.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aimadetools.com/race/" rel="noopener noreferrer"&gt;Live Dashboard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aimadetools.com/blog/race-day-1-results/" rel="noopener noreferrer"&gt;Full Day 1 writeup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aimadetools" rel="noopener noreferrer"&gt;GitHub repos&lt;/a&gt; (all 7 agent repos are public)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll be posting weekly recaps and daily highlights for the full 12 weeks. Would love to hear what you'd want to see tracked or compared.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Launch Day: 7 AI Agents Start Building Startups with $100 Each</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Mon, 20 Apr 2026 07:30:00 +0000</pubDate>
      <link>https://forem.com/ai_made_tools/launch-day-7-ai-agents-start-building-startups-with-100-each-5f8h</link>
      <guid>https://forem.com/ai_made_tools/launch-day-7-ai-agents-start-building-startups-with-100-each-5f8h</guid>
      <description>&lt;p&gt;I just launched an experiment: 7 AI coding agents each get $100 and 12 weeks to build a real startup from scratch. No human coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lineup
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🟣 Claude&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Sonnet / Haiku&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟢 GPT&lt;/td&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;GPT-5.4 / Mini&lt;/td&gt;
&lt;td&gt;€23/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔵 Gemini&lt;/td&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;2.5 Pro / Flash&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔴 DeepSeek&lt;/td&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;Reasoner / Chat&lt;/td&gt;
&lt;td&gt;~$25/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟠 Kimi&lt;/td&gt;
&lt;td&gt;Kimi CLI&lt;/td&gt;
&lt;td&gt;K2.5&lt;/td&gt;
&lt;td&gt;~$19/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟡 Xiaomi&lt;/td&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;MiMo V2 Pro&lt;/td&gt;
&lt;td&gt;~$25/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🟤 GLM&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;GLM-5.1 / 4.7&lt;/td&gt;
&lt;td&gt;$18/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each agent autonomously picks an idea, writes code, deploys, and tries to get users and revenue.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from 3 test runs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy &amp;gt; code quality.&lt;/strong&gt; Agents that planned distribution first outperformed agents that wrote better code. One agent (Kimi) planned a full Product Hunt launch before writing a single line of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple stacks win.&lt;/strong&gt; HTML + Tailwind deployed in hours. Next.js agents spent days on build errors. The deploy loop is the real bottleneck for AI agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context resets kill progress.&lt;/strong&gt; Without persistent state between sessions, agents repeat mistakes. I built an orchestrator with structured state files to solve this.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech
&lt;/h2&gt;

&lt;p&gt;A bash orchestrator manages everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cron-scheduled 30-minute sessions (2-8 per agent per day)&lt;/li&gt;
&lt;li&gt;Automatic git commits with &lt;code&gt;[skip ci]&lt;/code&gt; on mid-session commits&lt;/li&gt;
&lt;li&gt;Deploy verification via health checks&lt;/li&gt;
&lt;li&gt;Loop detection (same action 3x = force alternative)&lt;/li&gt;
&lt;li&gt;OpenRouter budget alerts via Discord&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All code is public on &lt;a href="https://github.com/aimadetools" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Follow along
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;Live Dashboard&lt;/a&gt; — real-time progress&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.aimadetools.com/race/compare" rel="noopener noreferrer"&gt;Daily Digest&lt;/a&gt; — hand-written daily updates&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.aimadetools.com/race/activity" rel="noopener noreferrer"&gt;Weekly Recaps&lt;/a&gt; — detailed analysis&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aimadetools.com/race/rules" rel="noopener noreferrer"&gt;Full Rules&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also launched on &lt;a href="https://www.producthunt.com/" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;Which agent would you bet on?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>startup</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
