Forem: CodePawl

Claude Code source code leaked again via npm source map — third time now

CodePawl — Tue, 31 Mar 2026 09:12:36 +0000

Anthropic shipped a 57MB cli.js.map file in the latest Claude Code npm package. Again.

The source map contains the full TypeScript source, extractable in seconds. The src/ directory includes everything: components, commands, tools, services, hooks, query engine, cost tracker, context handling, the works. 785K main.tsx, 67K query.ts, 46K QueryEngine.ts, 29K Tool.ts.

This is at least the third time this has happened:

Feb 2025 — source maps shipped in the npm package. Anthropic rushed to yank it and purge npm cache. Someone recovered it from their Sublime Text undo buffer.
~Sep 2025 — leaked again via the same vector.
Mar 31, 2026 — today. 57MB map file, full source, still sitting in the npm registry.

The irony: this happened on the same day as the axios supply chain attack, where a hijacked maintainer pushed malicious code through npm. npm's trust model is having a rough day.

To be fair, source code leaking from an npm package isn't a security vulnerability. The code is always technically extractable from the minified bundle. Source maps just make it trivial instead of painful. But shipping them three times suggests the build pipeline still doesn't strip them reliably.

At this point Anthropic might as well just open source it. The code leaks every few months anyway.

Original post
An — Codepawl

axios Got Hijacked Today: A Technical Breakdown of the Most Sophisticated npm Supply Chain Attack Yet

CodePawl — Tue, 31 Mar 2026 08:15:59 +0000

If you use axios — and statistically, you do — you need to read this.

On March 31, 2026, two malicious versions of axios were published to npm: 1.14.1 and 0.30.4. The attacker hijacked a lead maintainer's npm account, injected a hidden dependency that deploys a cross-platform RAT, and designed the entire payload to self-destruct after execution. The malicious versions were live for roughly 3 hours before npm pulled them.

This isn't a typosquat. This isn't a random package nobody uses. This is axios — 100M+ weekly downloads, present in virtually every Node.js project that touches HTTP.

What happened

The attacker compromised the npm account of jasonsaayman, the primary axios maintainer. They changed the account email to an anonymous ProtonMail address (ifstap@proton.me) and published the poisoned packages manually via npm CLI, completely bypassing the project's GitHub Actions CI/CD pipeline.

The key forensic signal: every legitimate axios 1.x release is published via GitHub Actions with npm's OIDC Trusted Publisher mechanism — cryptographically tied to a verified workflow. axios@1.14.1 has no OIDC binding, no gitHead, no corresponding GitHub commit or tag. It exists only on npm.

The attacker likely obtained a long-lived classic npm access token. The OIDC tokens used by legitimate releases are ephemeral and scoped — they can't be stolen in the traditional sense.

The attack chain

The attack was pre-staged 18 hours in advance. Here's the timeline:

Mar 30, 05:57 UTC — plain-crypto-js@4.2.0 published (clean decoy, establishes npm history)
Mar 30, 23:59 UTC — plain-crypto-js@4.2.1 published (malicious payload added)
Mar 31, 00:21 UTC — axios@1.14.1 published via hijacked account
Mar 31, 01:00 UTC — axios@0.30.4 published (legacy branch, 39 min later)
Mar 31, ~03:15 UTC — npm pulls both malicious axios versions

Both malicious axios versions add exactly one new dependency: plain-crypto-js@^4.2.1. This package is never imported or required anywhere in the axios source. Its sole purpose is to execute a postinstall hook.

// axios@1.14.0 deps: follow-redirects, form-data, proxy-from-env
// axios@1.14.1 deps: follow-redirects, form-data, proxy-from-env, plain-crypto-js ← new

A dependency that exists in package.json but has zero usage in the codebase is a high-confidence indicator of a compromised release.

Inside the dropper

The setup.js file (4209 bytes, minified) uses a two-layer obfuscation scheme:

XOR cipher with key derived from "OrDeR_7077" — only the digits 7,0,7,7 survive JavaScript's Number() parsing, rest becomes NaN → 0 in bitwise ops
Reverse + base64 decode as an outer layer

Once decoded, it dynamically loads child_process, os, and fs at runtime to evade static analysis, then branches by platform:

macOS: Writes an AppleScript that downloads a RAT binary to /Library/Caches/com.apple.act.mond — a path mimicking an Apple system daemon. Executed via osascript, then the script self-deletes.

Windows: Copies PowerShell to %PROGRAMDATA%\wt.exe (disguised as Windows Terminal), writes a VBScript that fetches and runs a hidden PowerShell RAT with -ExecutionPolicy Bypass -WindowStyle Hidden. Both temp files self-delete.

Linux: Direct curl to download a Python RAT to /tmp/ld.py, executed via nohup to detach from the process tree.

All three payloads phone home to sfrclak.com:8000 with platform-specific POST bodies (packages.npm.org/product0|1|2) — deliberately crafted to look like npm registry traffic in network logs.

The self-destruct sequence

After launching the payload, setup.js performs three cleanup steps:

Deletes itself (fs.unlink(__filename))
Deletes package.json (which contains the postinstall hook)
Renames a pre-staged package.md to package.json — a clean manifest with no scripts

Post-infection, node_modules/plain-crypto-js/ looks completely clean. npm audit won't flag it. Manual inspection won't catch it. But the existence of the directory itself is proof the dropper ran — plain-crypto-js is not a dependency of any legitimate axios version.

How to check if you're affected

# Check lockfile for compromised versions
grep -E "1\.14\.1|0\.30\.4" package-lock.json

# Check for the malicious dependency
ls node_modules/plain-crypto-js 2>/dev/null && echo "AFFECTED"

# Check for RAT artifacts
ls -la /Library/Caches/com.apple.act.mond 2>/dev/null  # macOS
ls -la /tmp/ld.py 2>/dev/null                           # Linux
dir "%PROGRAMDATA%\wt.exe" 2>nul                        # Windows

Remediation

Pin to safe versions: axios@1.14.0 (1.x) or axios@0.30.3 (0.x)
Remove the malicious package: rm -rf node_modules/plain-crypto-js
If RAT artifacts found: assume full system compromise, rotate ALL credentials (npm tokens, SSH keys, cloud keys, CI/CD secrets, .env values)
Audit CI/CD pipelines for any runs that installed during the window
Block the C2: sfrclak.com / 142.11.206.73

The bigger picture

This is the same pattern we've seen accelerating throughout 2025-2026: maintainer account hijack → manual npm publish → phantom dependency → postinstall dropper. The Shai-Hulud worm, the Qix compromise, Chalk/Debug — all variations on the same playbook.

The uncomfortable truth: npm's security model has a single-point-of-failure problem. Long-lived tokens still exist. Email changes don't require additional verification. Manual CLI publishing can bypass every CI/CD safeguard a project has built. Trusted Publishing (OIDC) is available but not enforced.

Some practical defenses:

npm ci --ignore-scripts in all CI/CD pipelines
Set ignore-scripts=true in ~/.npmrc for local dev (opt-in to postinstall only when needed)
Use lockfiles religiously and review diffs on dependency changes
bun and pnpm don't execute lifecycle scripts by default — worth considering
Package cooldown policies — most malicious packages are caught within 24 hours

The window of exposure was ~3 hours. The detection came from Socket and StepSecurity within minutes. But for a package with 100M+ weekly downloads, even 3 hours is a massive blast radius.

Pin your versions. Audit your lockfiles. Don't trust latest.

Sources: StepSecurity, Socket, Aikido, axios GitHub issue #10604

Written by An — founder of Codepawl, building open-source developer tools from HCMC, Vietnam.

codepawl.com · X @lunovian · X @codepawl · Discord

GPT-5, Claude, Gemini All Score Below 1% - ARC AGI 3 Just Broke Every Frontier Model

CodePawl — Thu, 26 Mar 2026 05:40:20 +0000

ARC-AGI-3, launched just yesterday on March 25, 2026, represents the most radical transformation of the ARC benchmark since François Chollet introduced it in 2019 — abandoning static grid puzzles entirely in favor of interactive, video-game-like environments where AI agents must discover rules, set goals, and solve problems with zero instructions.

The competition carries over $2 million in prizes across three tracks. Early preview results: frontier LLMs like GPT-5 and Claude score below 1%, while simple CNN and graph-search approaches reach 12.58%. The gap between human performance (100%) and the best AI agent remains enormous.

From grid puzzles to game worlds: what changed

ARC-AGI-3 is not an incremental difficulty upgrade — it is a fundamentally different benchmark. Previous versions (ARC-AGI-1 and ARC-AGI-2) presented static input-output grid pairs where systems inferred transformation rules and applied them. ARC-AGI-3 instead drops agents into turn-based game environments with no stated rules, no instructions, and no win conditions. Agents observe a 64×64 grid with 16 colors, take actions (move, click, reset), and must figure out both what to do and how to do it through pure interaction.

The benchmark comprises 1,000+ levels across 150+ handcrafted environments, each game containing 8–10 levels that progressively introduce new mechanics. Three preview games illustrate the range: ls20 requires navigating a map and transforming symbols, ft09 involves matching patterns across overlapping grids, and vc33 tasks agents with adjusting volumes to hit target heights. Scoring uses action efficiency — how many actions the agent needs compared to a human baseline — rather than binary pass/fail. A perfect 100% means the AI matches human efficiency across all games.

The evolution across versions tells a clear story of escalating challenge:

Feature	ARC-AGI-1 (2019)	ARC-AGI-2 (2025)	ARC-AGI-3 (2026)
Format	Static grid puzzles	Static grid puzzles (harder)	Interactive game environments
Instructions	Input-output demo pairs	Input-output demo pairs	None — discover through interaction
Best AI score	~90%+ (saturated)	24% (competition)	12.58% (preview)
Human baseline	~85%	~60% average	100%
Scoring	Binary accuracy	Accuracy + cost-per-task	Action efficiency vs. humans
Tasks	~400 training + 100 eval	1,000 training + 120 eval per split	1,000+ levels, 150+ environments

ARC-AGI-1 became effectively saturated by 2025, with frontier models hitting 90%+ through brute-force engineering. ARC-AGI-2 introduced harder compositional tasks — symbolic interpretation, contextual rule application, multiple interacting rules — that dropped the best competition score to 24%. ARC-AGI-3 tests four entirely new capabilities: exploration (actively gathering information), modeling (building generalizable world models), goal-setting (identifying objectives without instructions), and planning with execution (strategic action with course-correction).

Preview leaderboard reveals LLMs' interactive reasoning gap

The competition literally launched yesterday, so the official Kaggle leaderboard has no entries yet. However, a 30-day developer preview preceding the launch produced highly informative results from 12 submissions (8 tested on private games):

Rank	Team	Approach	Score	Levels Solved
1st	StochasticGoose (Tufa Labs)	CNN + RL action-learning	12.58%	18
2nd	Blind Squirrel	State graph exploration + ResNet18	6.71%	13
3rd	Explore It Till You Solve It	Training-free frame graph	3.64%	12
—	Best frontier LLM agent	LLM-based	<1%	~2–3
—	Human players	Human cognition	100%	All

All three top systems used non-LLM approaches. StochasticGoose, built by Dries Smit at Tufa Labs, employed a CNN-based action prediction model with simple reinforcement learning and sparse rewards (only level completion signals). It stored frame transitions in memory for off-policy training, used hash tables to avoid duplicate states, and iteratively retrained its model between levels. The team explicitly avoided LLMs because the observation complexity — hundreds of interaction steps — would generate millions of tokens.

The third-place system, documented in an arXiv paper (Rudakov et al., 2512.24156), used a completely training-free graph-based exploration method, building state graphs and systematically exploring them. It solved a median of 30 out of 52 levels across 6 games but was limited by computational scaling with state space size.

Frontier LLMs' sub-1% performance is perhaps the most significant data point. The interactive format — requiring sustained sequential reasoning, state tracking across hundreds of steps, and learning from environmental feedback — exposes a fundamental limitation of current language models that static benchmarks never tested.

$2 million across three tracks with strict open-source requirements

The ARC Prize 2026 splits its prize pool across three parallel competition tracks, each hosted on Kaggle:

ARC-AGI-3 Track — $850,000 total:
The grand prize of $700,000 goes to the first agent scoring 100% on evaluation (carries over if unclaimed). A guaranteed $75,000 top-score award distributes $40K/1st, $15K/2nd, $10K/3rd, and $5K each for 4th–5th. Two milestone prizes totaling $75,000 reward early progress: $25K/$10K/$2.5K at each milestone (June 30 and September 30).

ARC-AGI-2 Track — ~$1 million: The $700K grand prize for scoring 85% on ARC-AGI-2 remains unclaimed from both 2024 and 2025, and continues into 2026 alongside separate score awards.

Paper Prize Track: Awards for research papers advancing understanding of strong ARC-AGI performance.

Critical competition constraints shape viable approaches. All winning solutions must be open-sourced under permissive licenses (CC0 or MIT-0) before receiving private evaluation scores. Kaggle evaluation runs with no internet access — meaning no API calls to OpenAI, Anthropic, Google, or any cloud inference endpoint. Teams must either use open-weight models running locally or build entirely non-LLM systems. The ARC-AGI-3 toolkit is open-source (MIT license, pip install arc-agi) and runs at 2,000+ FPS locally, but requires an API key from arcprize.org.

What approaches are competitors likely to pursue

The preview results and historical ARC competition patterns suggest several viable research directions for ARC-AGI-3:

Reinforcement learning with lightweight neural networks is the proven frontrunner. StochasticGoose's CNN + sparse RL approach dominated the preview. Simple action prediction models that learn which actions cause meaningful state changes, combined with systematic exploration, appear far more effective than sophisticated language understanding.

Graph-based state exploration offers a training-free alternative. Building explicit state graphs, pruning loops, and systematically mapping environment dynamics worked surprisingly well (6.71% for Blind Squirrel). This approach trades compute for algorithmic efficiency but scales poorly with state space size.

Meta-learning and curiosity-driven RL are natural fits given ARC-AGI-3's requirement for rapid adaptation to novel environments. Methods like BYOL-Hindsight and intrinsic motivation were discussed during the preview period but proved finicky with short timeframes and sparse rewards.

World models (Dreamer family, latent dynamics models) could learn environment physics in imagination before acting, but are limited by ARC-AGI-3's sparse reward signal — only level completion provides feedback.

For the continuing ARC-AGI-2 track, the dominant paradigm from 2025 was synthetic data generation combined with test-time training — NVARC's winning approach used Qwen3-4B fine-tuned on 103K synthetic puzzles plus 3.2M augmented samples. Other strong directions include masked diffusion models (ARChitects), evolutionary program synthesis (SOAR), and minimum description length approaches (CompressARC).

Key dates and competition timeline

Date	Milestone
March 25, 2026	Competition opens on Kaggle
June 30, 2026	ARC-AGI-3 Milestone #1 ($37,500 in prizes)
September 30, 2026	ARC-AGI-3 Milestone #2 ($37,500 in prizes)
November 2, 2026	All submissions due
November 8, 2026	Paper track submissions due
December 4, 2026	Results announced

During the competition, Kaggle leaderboard standings reflect scores on a semi-private dataset. Final rankings and prize eligibility use a separate private dataset, following the same anti-gaming structure as previous years. Human calibration data was collected from 1,200+ players across 3,900+ games during the preview, with a controlled study of 200+ participants establishing production baselines.

Try it yourself

ARC-AGI-3 toolkit is open-source and runs locally:

pip install arc-agi

You'll need an API key from arcprize.org to access the environments. The toolkit runs at 2,000+ FPS locally.

Full competition details and submission: ARC Prize 2026 on Kaggle

Conclusion

ARC-AGI-3 is not merely a harder test — it measures a fundamentally different kind of intelligence. The shift from static pattern recognition to interactive exploration and goal discovery exposes capabilities that current AI systems, including frontier LLMs, demonstrably lack. The preview data is unambiguous: simple RL and graph search at 12.58% versus frontier LLMs below 1% suggests that the path to solving ARC-AGI-3 runs through novel algorithmic ideas rather than model scaling.

With $850K on the line for the interactive track alone and milestone prizes creating incentives for early progress, the next eight months should produce significant advances in adaptive AI reasoning — all of which, by competition rules, will be open-sourced for the broader research community.

Are you planning to compete? What approach would you try? Drop your thoughts in the comments.

We're CodePawl — an open-source-first firm building tools for developers. Follow us on X or join our Discord.