<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: CodePawl</title>
    <description>The latest articles on Forem by CodePawl (@codepawl).</description>
    <link>https://forem.com/codepawl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843996%2F48508502-26e8-4fbd-94a1-62e082acbbc7.png</url>
      <title>Forem: CodePawl</title>
      <link>https://forem.com/codepawl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/codepawl"/>
    <language>en</language>
    <item>
      <title>Claude Code source code leaked again via npm source map — third time now</title>
      <dc:creator>CodePawl</dc:creator>
      <pubDate>Tue, 31 Mar 2026 09:12:36 +0000</pubDate>
      <link>https://forem.com/codepawl/claude-code-source-code-leaked-again-via-npm-source-map-third-time-now-2j55</link>
      <guid>https://forem.com/codepawl/claude-code-source-code-leaked-again-via-npm-source-map-third-time-now-2j55</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fervosrk5qzldvaznptwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fervosrk5qzldvaznptwq.png" alt=" " width="587" height="734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anthropic shipped a 57MB &lt;code&gt;cli.js.map&lt;/code&gt; file in the latest Claude Code npm package. Again.&lt;/p&gt;

&lt;p&gt;The source map contains the full TypeScript source, extractable in seconds. The &lt;code&gt;src/&lt;/code&gt; directory includes everything: components, commands, tools, services, hooks, query engine, cost tracker, context handling, the works. 785K &lt;code&gt;main.tsx&lt;/code&gt;, 67K &lt;code&gt;query.ts&lt;/code&gt;, 46K &lt;code&gt;QueryEngine.ts&lt;/code&gt;, 29K &lt;code&gt;Tool.ts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8slampqm6jbfew93lz6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8slampqm6jbfew93lz6.png" alt=" " width="800" height="995"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is at least the third time this has happened:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Feb 2025&lt;/strong&gt; — source maps shipped in the npm package. Anthropic rushed to yank it and purge npm cache. Someone recovered it from their Sublime Text undo buffer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~Sep 2025&lt;/strong&gt; — leaked again via the same vector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 31, 2026&lt;/strong&gt; — today. 57MB map file, full source, still sitting in the npm registry.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva1xbbneg2kzgx0e4jha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva1xbbneg2kzgx0e4jha.png" alt=" " width="428" height="679"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The irony: this happened on the same day as the axios supply chain attack, where a hijacked maintainer pushed malicious code through npm. npm's trust model is having a rough day.&lt;/p&gt;

&lt;p&gt;To be fair, source code leaking from an npm package isn't a security vulnerability. The code is always technically extractable from the minified bundle. Source maps just make it trivial instead of painful. But shipping them three times suggests the build pipeline still doesn't strip them reliably.&lt;/p&gt;

&lt;p&gt;At this point Anthropic might as well just open source it. The code leaks every few months anyway.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://x.com/Fried_rice/status/2038894956459290963?s=20" rel="noopener noreferrer"&gt;Original post&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://x.com/lunovian" rel="noopener noreferrer"&gt;An&lt;/a&gt; — &lt;a href="https://codepawl.com" rel="noopener noreferrer"&gt;Codepawl&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>axios Got Hijacked Today: A Technical Breakdown of the Most Sophisticated npm Supply Chain Attack Yet</title>
      <dc:creator>CodePawl</dc:creator>
      <pubDate>Tue, 31 Mar 2026 08:15:59 +0000</pubDate>
      <link>https://forem.com/codepawl-org/axios-got-hijacked-today-a-technical-breakdown-of-the-most-sophisticated-npm-supply-chain-attack-2i82</link>
      <guid>https://forem.com/codepawl-org/axios-got-hijacked-today-a-technical-breakdown-of-the-most-sophisticated-npm-supply-chain-attack-2i82</guid>
      <description>&lt;p&gt;If you use axios — and statistically, you do — you need to read this.&lt;/p&gt;

&lt;p&gt;On March 31, 2026, two malicious versions of axios were published to npm: &lt;code&gt;1.14.1&lt;/code&gt; and &lt;code&gt;0.30.4&lt;/code&gt;. The attacker hijacked a lead maintainer's npm account, injected a hidden dependency that deploys a cross-platform RAT, and designed the entire payload to self-destruct after execution. The malicious versions were live for roughly 3 hours before npm pulled them.&lt;/p&gt;

&lt;p&gt;This isn't a typosquat. This isn't a random package nobody uses. This is &lt;strong&gt;axios&lt;/strong&gt; — 100M+ weekly downloads, present in virtually every Node.js project that touches HTTP.&lt;/p&gt;




&lt;h2&gt;
  
  
  What happened
&lt;/h2&gt;

&lt;p&gt;The attacker compromised the npm account of &lt;code&gt;jasonsaayman&lt;/code&gt;, the primary axios maintainer. They changed the account email to an anonymous ProtonMail address (&lt;code&gt;ifstap@proton.me&lt;/code&gt;) and published the poisoned packages &lt;strong&gt;manually via npm CLI&lt;/strong&gt;, completely bypassing the project's GitHub Actions CI/CD pipeline.&lt;/p&gt;

&lt;p&gt;The key forensic signal: every legitimate axios 1.x release is published via GitHub Actions with npm's OIDC Trusted Publisher mechanism — cryptographically tied to a verified workflow. &lt;code&gt;axios@1.14.1&lt;/code&gt; has no OIDC binding, no &lt;code&gt;gitHead&lt;/code&gt;, no corresponding GitHub commit or tag. It exists only on npm.&lt;/p&gt;

&lt;p&gt;The attacker likely obtained a &lt;strong&gt;long-lived classic npm access token&lt;/strong&gt;. The OIDC tokens used by legitimate releases are ephemeral and scoped — they can't be stolen in the traditional sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The attack chain
&lt;/h2&gt;

&lt;p&gt;The attack was pre-staged 18 hours in advance. Here's the timeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mar 30, 05:57 UTC&lt;/strong&gt; — &lt;code&gt;plain-crypto-js@4.2.0&lt;/code&gt; published (clean decoy, establishes npm history)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 30, 23:59 UTC&lt;/strong&gt; — &lt;code&gt;plain-crypto-js@4.2.1&lt;/code&gt; published (malicious payload added)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 31, 00:21 UTC&lt;/strong&gt; — &lt;code&gt;axios@1.14.1&lt;/code&gt; published via hijacked account&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 31, 01:00 UTC&lt;/strong&gt; — &lt;code&gt;axios@0.30.4&lt;/code&gt; published (legacy branch, 39 min later)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mar 31, ~03:15 UTC&lt;/strong&gt; — npm pulls both malicious axios versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both malicious axios versions add exactly one new dependency: &lt;code&gt;plain-crypto-js@^4.2.1&lt;/code&gt;. This package is &lt;strong&gt;never imported or required anywhere&lt;/strong&gt; in the axios source. Its sole purpose is to execute a &lt;code&gt;postinstall&lt;/code&gt; hook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// axios@1.14.0 deps: follow-redirects, form-data, proxy-from-env&lt;/span&gt;
&lt;span class="c1"&gt;// axios@1.14.1 deps: follow-redirects, form-data, proxy-from-env, plain-crypto-js ← new&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A dependency that exists in &lt;code&gt;package.json&lt;/code&gt; but has zero usage in the codebase is a high-confidence indicator of a compromised release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inside the dropper
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;setup.js&lt;/code&gt; file (4209 bytes, minified) uses a two-layer obfuscation scheme:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;XOR cipher&lt;/strong&gt; with key derived from &lt;code&gt;"OrDeR_7077"&lt;/code&gt; — only the digits &lt;code&gt;7,0,7,7&lt;/code&gt; survive JavaScript's &lt;code&gt;Number()&lt;/code&gt; parsing, rest becomes &lt;code&gt;NaN → 0&lt;/code&gt; in bitwise ops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reverse + base64 decode&lt;/strong&gt; as an outer layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once decoded, it dynamically loads &lt;code&gt;child_process&lt;/code&gt;, &lt;code&gt;os&lt;/code&gt;, and &lt;code&gt;fs&lt;/code&gt; at runtime to evade static analysis, then branches by platform:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt; Writes an AppleScript that downloads a RAT binary to &lt;code&gt;/Library/Caches/com.apple.act.mond&lt;/code&gt; — a path mimicking an Apple system daemon. Executed via &lt;code&gt;osascript&lt;/code&gt;, then the script self-deletes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt; Copies PowerShell to &lt;code&gt;%PROGRAMDATA%\wt.exe&lt;/code&gt; (disguised as Windows Terminal), writes a VBScript that fetches and runs a hidden PowerShell RAT with &lt;code&gt;-ExecutionPolicy Bypass -WindowStyle Hidden&lt;/code&gt;. Both temp files self-delete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linux:&lt;/strong&gt; Direct &lt;code&gt;curl&lt;/code&gt; to download a Python RAT to &lt;code&gt;/tmp/ld.py&lt;/code&gt;, executed via &lt;code&gt;nohup&lt;/code&gt; to detach from the process tree.&lt;/p&gt;

&lt;p&gt;All three payloads phone home to &lt;code&gt;sfrclak.com:8000&lt;/code&gt; with platform-specific POST bodies (&lt;code&gt;packages.npm.org/product0|1|2&lt;/code&gt;) — deliberately crafted to look like npm registry traffic in network logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The self-destruct sequence
&lt;/h2&gt;

&lt;p&gt;After launching the payload, &lt;code&gt;setup.js&lt;/code&gt; performs three cleanup steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deletes itself (&lt;code&gt;fs.unlink(__filename)&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Deletes &lt;code&gt;package.json&lt;/code&gt; (which contains the &lt;code&gt;postinstall&lt;/code&gt; hook)&lt;/li&gt;
&lt;li&gt;Renames a pre-staged &lt;code&gt;package.md&lt;/code&gt; to &lt;code&gt;package.json&lt;/code&gt; — a clean manifest with no scripts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Post-infection, &lt;code&gt;node_modules/plain-crypto-js/&lt;/code&gt; looks completely clean. &lt;code&gt;npm audit&lt;/code&gt; won't flag it. Manual inspection won't catch it. But the &lt;strong&gt;existence of the directory itself&lt;/strong&gt; is proof the dropper ran — &lt;code&gt;plain-crypto-js&lt;/code&gt; is not a dependency of any legitimate axios version.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to check if you're affected
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check lockfile for compromised versions&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"1&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;14&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;1|0&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;30&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;4"&lt;/span&gt; package-lock.json

&lt;span class="c"&gt;# Check for the malicious dependency&lt;/span&gt;
&lt;span class="nb"&gt;ls &lt;/span&gt;node_modules/plain-crypto-js 2&amp;gt;/dev/null &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"AFFECTED"&lt;/span&gt;

&lt;span class="c"&gt;# Check for RAT artifacts&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /Library/Caches/com.apple.act.mond 2&amp;gt;/dev/null  &lt;span class="c"&gt;# macOS&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /tmp/ld.py 2&amp;gt;/dev/null                           &lt;span class="c"&gt;# Linux&lt;/span&gt;
&lt;span class="nb"&gt;dir&lt;/span&gt; &lt;span class="s2"&gt;"%PROGRAMDATA%&lt;/span&gt;&lt;span class="se"&gt;\w&lt;/span&gt;&lt;span class="s2"&gt;t.exe"&lt;/span&gt; 2&amp;gt;nul                        &lt;span class="c"&gt;# Windows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Remediation
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pin to safe versions:&lt;/strong&gt; &lt;code&gt;axios@1.14.0&lt;/code&gt; (1.x) or &lt;code&gt;axios@0.30.3&lt;/code&gt; (0.x)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove the malicious package:&lt;/strong&gt; &lt;code&gt;rm -rf node_modules/plain-crypto-js&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If RAT artifacts found:&lt;/strong&gt; assume full system compromise, rotate ALL credentials (npm tokens, SSH keys, cloud keys, CI/CD secrets, &lt;code&gt;.env&lt;/code&gt; values)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit CI/CD pipelines&lt;/strong&gt; for any runs that installed during the window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block the C2:&lt;/strong&gt; &lt;code&gt;sfrclak.com&lt;/code&gt; / &lt;code&gt;142.11.206.73&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This is the same pattern we've seen accelerating throughout 2025-2026: maintainer account hijack → manual npm publish → phantom dependency → postinstall dropper. The Shai-Hulud worm, the Qix compromise, Chalk/Debug — all variations on the same playbook.&lt;/p&gt;

&lt;p&gt;The uncomfortable truth: &lt;strong&gt;npm's security model has a single-point-of-failure problem.&lt;/strong&gt; Long-lived tokens still exist. Email changes don't require additional verification. Manual CLI publishing can bypass every CI/CD safeguard a project has built. Trusted Publishing (OIDC) is available but not enforced.&lt;/p&gt;

&lt;p&gt;Some practical defenses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;npm ci --ignore-scripts&lt;/code&gt;&lt;/strong&gt; in all CI/CD pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;ignore-scripts=true&lt;/code&gt;&lt;/strong&gt; in &lt;code&gt;~/.npmrc&lt;/code&gt; for local dev (opt-in to postinstall only when needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use lockfiles religiously&lt;/strong&gt; and review diffs on dependency changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bun and pnpm&lt;/strong&gt; don't execute lifecycle scripts by default — worth considering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Package cooldown policies&lt;/strong&gt; — most malicious packages are caught within 24 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The window of exposure was ~3 hours. The detection came from Socket and StepSecurity within minutes. But for a package with 100M+ weekly downloads, even 3 hours is a massive blast radius.&lt;/p&gt;

&lt;p&gt;Pin your versions. Audit your lockfiles. Don't trust &lt;code&gt;latest&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-trojan" rel="noopener noreferrer"&gt;StepSecurity&lt;/a&gt;, &lt;a href="https://socket.dev/blog/axios-npm-package-compromised" rel="noopener noreferrer"&gt;Socket&lt;/a&gt;, &lt;a href="https://www.aikido.dev/blog/axios-npm-compromised-maintainer-hijacked-rat" rel="noopener noreferrer"&gt;Aikido&lt;/a&gt;, &lt;a href="https://github.com/axios/axios/issues/10604" rel="noopener noreferrer"&gt;axios GitHub issue #10604&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://x.com/lunovian" rel="noopener noreferrer"&gt;An&lt;/a&gt; — founder of &lt;a href="https://codepawl.com" rel="noopener noreferrer"&gt;Codepawl&lt;/a&gt;, building open-source developer tools from HCMC, Vietnam.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://codepawl.com" rel="noopener noreferrer"&gt;codepawl.com&lt;/a&gt; · &lt;a href="https://x.com/lunovian" rel="noopener noreferrer"&gt;X @lunovian&lt;/a&gt; · &lt;a href="https://x.com/codepawl" rel="noopener noreferrer"&gt;X @codepawl&lt;/a&gt; · &lt;a href="https://discord.gg/7fydHgK6kA" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>security</category>
      <category>javascript</category>
    </item>
    <item>
      <title>GPT-5, Claude, Gemini All Score Below 1% - ARC AGI 3 Just Broke Every Frontier Model</title>
      <dc:creator>CodePawl</dc:creator>
      <pubDate>Thu, 26 Mar 2026 05:40:20 +0000</pubDate>
      <link>https://forem.com/codepawl/gpt-5-claude-gemini-all-score-below-1-arc-agi-3-just-broke-every-frontier-model-5dbj</link>
      <guid>https://forem.com/codepawl/gpt-5-claude-gemini-all-score-below-1-arc-agi-3-just-broke-every-frontier-model-5dbj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpovch0xibriez6797l2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpovch0xibriez6797l2t.png" alt="Three system types compared on ARC-AGI: Reasoning Systems (models like o1/o3 tested at varying thinking levels, showing diminishing returns as reasoning time increases), Base LLMs (single-shot inference from standard models, no extended reasoning), and Kaggle Systems (competition submissions optimized under a $50 compute budget, purpose-built for ARC). Key insight: purpose-built Kaggle systems outperform both base LLMs and reasoning-augmented models despite far less compute." width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARC-AGI-3, launched just yesterday on March 25, 2026, represents the most radical transformation of the ARC benchmark since François Chollet introduced it in 2019&lt;/strong&gt; — abandoning static grid puzzles entirely in favor of interactive, video-game-like environments where AI agents must discover rules, set goals, and solve problems with zero instructions.&lt;/p&gt;

&lt;p&gt;The competition carries over &lt;strong&gt;$2 million in prizes&lt;/strong&gt; across three tracks. Early preview results: &lt;strong&gt;frontier LLMs like GPT-5 and Claude score below 1%&lt;/strong&gt;, while simple CNN and graph-search approaches reach &lt;strong&gt;12.58%&lt;/strong&gt;. The gap between human performance (100%) and the best AI agent remains enormous.&lt;/p&gt;

&lt;h2&gt;
  
  
  From grid puzzles to game worlds: what changed
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx1gqvgojwhljy7satpk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx1gqvgojwhljy7satpk.png" alt="Can you build an agent to beat this game?" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ARC-AGI-3 is not an incremental difficulty upgrade — it is a fundamentally different benchmark. Previous versions (ARC-AGI-1 and ARC-AGI-2) presented static input-output grid pairs where systems inferred transformation rules and applied them. ARC-AGI-3 instead drops agents into &lt;strong&gt;turn-based game environments&lt;/strong&gt; with no stated rules, no instructions, and no win conditions. Agents observe a 64×64 grid with 16 colors, take actions (move, click, reset), and must figure out both &lt;em&gt;what to do&lt;/em&gt; and &lt;em&gt;how to do it&lt;/em&gt; through pure interaction.&lt;/p&gt;

&lt;p&gt;The benchmark comprises &lt;strong&gt;1,000+ levels across 150+ handcrafted environments&lt;/strong&gt;, each game containing 8–10 levels that progressively introduce new mechanics. Three preview games illustrate the range: &lt;code&gt;ls20&lt;/code&gt; requires navigating a map and transforming symbols, &lt;code&gt;ft09&lt;/code&gt; involves matching patterns across overlapping grids, and &lt;code&gt;vc33&lt;/code&gt; tasks agents with adjusting volumes to hit target heights. Scoring uses &lt;strong&gt;action efficiency&lt;/strong&gt; — how many actions the agent needs compared to a human baseline — rather than binary pass/fail. A perfect 100% means the AI matches human efficiency across all games.&lt;/p&gt;

&lt;p&gt;The evolution across versions tells a clear story of escalating challenge:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;ARC-AGI-1 (2019)&lt;/th&gt;
&lt;th&gt;ARC-AGI-2 (2025)&lt;/th&gt;
&lt;th&gt;ARC-AGI-3 (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Format&lt;/td&gt;
&lt;td&gt;Static grid puzzles&lt;/td&gt;
&lt;td&gt;Static grid puzzles (harder)&lt;/td&gt;
&lt;td&gt;Interactive game environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instructions&lt;/td&gt;
&lt;td&gt;Input-output demo pairs&lt;/td&gt;
&lt;td&gt;Input-output demo pairs&lt;/td&gt;
&lt;td&gt;None — discover through interaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best AI score&lt;/td&gt;
&lt;td&gt;~90%+ (saturated)&lt;/td&gt;
&lt;td&gt;24% (competition)&lt;/td&gt;
&lt;td&gt;12.58% (preview)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human baseline&lt;/td&gt;
&lt;td&gt;~85%&lt;/td&gt;
&lt;td&gt;~60% average&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scoring&lt;/td&gt;
&lt;td&gt;Binary accuracy&lt;/td&gt;
&lt;td&gt;Accuracy + cost-per-task&lt;/td&gt;
&lt;td&gt;Action efficiency vs. humans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tasks&lt;/td&gt;
&lt;td&gt;~400 training + 100 eval&lt;/td&gt;
&lt;td&gt;1,000 training + 120 eval per split&lt;/td&gt;
&lt;td&gt;1,000+ levels, 150+ environments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ARC-AGI-1 became effectively saturated by 2025, with frontier models hitting 90%+ through brute-force engineering. ARC-AGI-2 introduced harder compositional tasks — symbolic interpretation, contextual rule application, multiple interacting rules — that dropped the best competition score to &lt;strong&gt;24%&lt;/strong&gt;. ARC-AGI-3 tests four entirely new capabilities: &lt;strong&gt;exploration&lt;/strong&gt; (actively gathering information), &lt;strong&gt;modeling&lt;/strong&gt; (building generalizable world models), &lt;strong&gt;goal-setting&lt;/strong&gt; (identifying objectives without instructions), and &lt;strong&gt;planning with execution&lt;/strong&gt; (strategic action with course-correction).&lt;/p&gt;

&lt;h2&gt;
  
  
  Preview leaderboard reveals LLMs' interactive reasoning gap
&lt;/h2&gt;

&lt;p&gt;The competition literally launched yesterday, so the official Kaggle leaderboard has no entries yet. However, a 30-day developer preview preceding the launch produced highly informative results from 12 submissions (8 tested on private games):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Levels Solved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1st&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;StochasticGoose&lt;/strong&gt; (Tufa Labs)&lt;/td&gt;
&lt;td&gt;CNN + RL action-learning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.58%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2nd&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Blind Squirrel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State graph exploration + ResNet18&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.71%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3rd&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Explore It Till You Solve It&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Training-free frame graph&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.64%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Best frontier LLM agent&lt;/td&gt;
&lt;td&gt;LLM-based&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~2–3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Human players&lt;/td&gt;
&lt;td&gt;Human cognition&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;All three top systems used non-LLM approaches.&lt;/strong&gt; StochasticGoose, built by Dries Smit at Tufa Labs, employed a CNN-based action prediction model with simple reinforcement learning and sparse rewards (only level completion signals). It stored frame transitions in memory for off-policy training, used hash tables to avoid duplicate states, and iteratively retrained its model between levels. The team explicitly avoided LLMs because the observation complexity — hundreds of interaction steps — would generate millions of tokens.&lt;/p&gt;

&lt;p&gt;The third-place system, documented in an arXiv paper (Rudakov et al., 2512.24156), used a completely training-free graph-based exploration method, building state graphs and systematically exploring them. It solved a median of 30 out of 52 levels across 6 games but was limited by computational scaling with state space size.&lt;/p&gt;

&lt;p&gt;Frontier LLMs' sub-1% performance is perhaps the most significant data point. The interactive format — requiring sustained sequential reasoning, state tracking across hundreds of steps, and learning from environmental feedback — exposes a fundamental limitation of current language models that static benchmarks never tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  $2 million across three tracks with strict open-source requirements
&lt;/h2&gt;

&lt;p&gt;The ARC Prize 2026 splits its prize pool across three parallel competition tracks, each hosted on Kaggle:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARC-AGI-3 Track — $850,000 total:&lt;/strong&gt;&lt;br&gt;
The grand prize of &lt;strong&gt;$700,000&lt;/strong&gt; goes to the first agent scoring 100% on evaluation (carries over if unclaimed). A guaranteed &lt;strong&gt;$75,000 top-score award&lt;/strong&gt; distributes $40K/1st, $15K/2nd, $10K/3rd, and $5K each for 4th–5th. Two milestone prizes totaling &lt;strong&gt;$75,000&lt;/strong&gt; reward early progress: $25K/$10K/$2.5K at each milestone (June 30 and September 30).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARC-AGI-2 Track — ~$1 million:&lt;/strong&gt; The $700K grand prize for scoring 85% on ARC-AGI-2 remains unclaimed from both 2024 and 2025, and continues into 2026 alongside separate score awards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Paper Prize Track:&lt;/strong&gt; Awards for research papers advancing understanding of strong ARC-AGI performance.&lt;/p&gt;

&lt;p&gt;Critical competition constraints shape viable approaches. &lt;strong&gt;All winning solutions must be open-sourced&lt;/strong&gt; under permissive licenses (CC0 or MIT-0) &lt;em&gt;before&lt;/em&gt; receiving private evaluation scores. &lt;strong&gt;Kaggle evaluation runs with no internet access&lt;/strong&gt; — meaning no API calls to OpenAI, Anthropic, Google, or any cloud inference endpoint. Teams must either use open-weight models running locally or build entirely non-LLM systems. The ARC-AGI-3 toolkit is open-source (MIT license, &lt;code&gt;pip install arc-agi&lt;/code&gt;) and runs at 2,000+ FPS locally, but requires an API key from arcprize.org.&lt;/p&gt;

&lt;h2&gt;
  
  
  What approaches are competitors likely to pursue
&lt;/h2&gt;

&lt;p&gt;The preview results and historical ARC competition patterns suggest several viable research directions for ARC-AGI-3:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reinforcement learning with lightweight neural networks&lt;/strong&gt; is the proven frontrunner. StochasticGoose's CNN + sparse RL approach dominated the preview. Simple action prediction models that learn which actions cause meaningful state changes, combined with systematic exploration, appear far more effective than sophisticated language understanding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graph-based state exploration&lt;/strong&gt; offers a training-free alternative. Building explicit state graphs, pruning loops, and systematically mapping environment dynamics worked surprisingly well (6.71% for Blind Squirrel). This approach trades compute for algorithmic efficiency but scales poorly with state space size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meta-learning and curiosity-driven RL&lt;/strong&gt; are natural fits given ARC-AGI-3's requirement for rapid adaptation to novel environments. Methods like BYOL-Hindsight and intrinsic motivation were discussed during the preview period but proved finicky with short timeframes and sparse rewards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;World models&lt;/strong&gt; (Dreamer family, latent dynamics models) could learn environment physics in imagination before acting, but are limited by ARC-AGI-3's sparse reward signal — only level completion provides feedback.&lt;/p&gt;

&lt;p&gt;For the continuing ARC-AGI-2 track, the dominant paradigm from 2025 was &lt;strong&gt;synthetic data generation combined with test-time training&lt;/strong&gt; — NVARC's winning approach used Qwen3-4B fine-tuned on 103K synthetic puzzles plus 3.2M augmented samples. Other strong directions include masked diffusion models (ARChitects), evolutionary program synthesis (SOAR), and minimum description length approaches (CompressARC).&lt;/p&gt;

&lt;h2&gt;
  
  
  Key dates and competition timeline
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Milestone&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;March 25, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Competition opens on Kaggle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;June 30, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ARC-AGI-3 Milestone #1 ($37,500 in prizes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;September 30, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ARC-AGI-3 Milestone #2 ($37,500 in prizes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;November 2, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All submissions due&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;November 8, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Paper track submissions due&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;December 4, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Results announced&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;During the competition, Kaggle leaderboard standings reflect scores on a semi-private dataset. Final rankings and prize eligibility use a separate private dataset, following the same anti-gaming structure as previous years. Human calibration data was collected from &lt;strong&gt;1,200+ players&lt;/strong&gt; across &lt;strong&gt;3,900+ games&lt;/strong&gt; during the preview, with a controlled study of 200+ participants establishing production baselines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;ARC-AGI-3 toolkit is open-source and runs locally:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install arc-agi&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You'll need an API key from &lt;a href="https://arcprize.org" rel="noopener noreferrer"&gt;arcprize.org&lt;/a&gt; to access the environments. The toolkit runs at 2,000+ FPS locally.&lt;/p&gt;

&lt;p&gt;Full competition details and submission: &lt;a href="https://www.kaggle.com/competitions/arc-prize-2026-arc-agi-3" rel="noopener noreferrer"&gt;ARC Prize 2026 on Kaggle&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;ARC-AGI-3 is not merely a harder test — it measures a fundamentally different kind of intelligence. The shift from static pattern recognition to interactive exploration and goal discovery exposes capabilities that current AI systems, including frontier LLMs, demonstrably lack. The preview data is unambiguous: &lt;strong&gt;simple RL and graph search at 12.58% versus frontier LLMs below 1%&lt;/strong&gt; suggests that the path to solving ARC-AGI-3 runs through novel algorithmic ideas rather than model scaling.&lt;/p&gt;

&lt;p&gt;With $850K on the line for the interactive track alone and milestone prizes creating incentives for early progress, the next eight months should produce significant advances in adaptive AI reasoning — all of which, by competition rules, will be open-sourced for the broader research community.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are you planning to compete? What approach would you try?&lt;/strong&gt; Drop your thoughts in the comments.&lt;/p&gt;




&lt;p&gt;We're &lt;a href="https://github.com/codepawl" rel="noopener noreferrer"&gt;CodePawl&lt;/a&gt; — an open-source-first firm building tools for developers. Follow us on &lt;a href="https://x.com/codepawl" rel="noopener noreferrer"&gt;X&lt;/a&gt; or join our &lt;a href="https://discord.gg/7fydHgK6kA" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>kaggle</category>
      <category>llm</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
