Forem: 1p

Cursor 3 shipped parallel agents and the community can't agree on whether that's good

1p — Thu, 30 Apr 2026 21:33:59 +0000

Cursor rebuilt from scratch around managing fleets of AI agents instead of writing code. The demos look very convincing. The HN thread is a mess. And someone spent $2,000 in two days. Here's what actually matters.

Quick context if you haven't been following the AI tooling space: Cursor is the VS Code fork built by Anysphere that became the de facto AI coding tool for a huge chunk of the dev community, hit $2B ARR earlier this year, and raised over $3 billion from NVIDIA, Google, and others. It's the tool people recommend when someone asks "should I just use Copilot."

On April 2, 2026 they shipped Cursor 3, internally codenamed Glass. It's not a point release. They rebuilt the interface from scratch.

The pitch: you are the architect, agents are the builders. The IDE is still there, but the default experience is now managing a fleet.

Thirty minutes after the announcement hit Hacker News, the top comment wasn't about a feature.

What actually shipped

The headline change is the Agents Window -- a full-screen workspace running alongside the IDE where you manage multiple AI agents in parallel. Previously: one chat, one agent, one task at a time. Now you can run as many as you want across different repos, local machines, worktrees, SSH environments, and cloud VMs from one place.

A few things worth knowing about specifically:

Cloud agent handoff is the feature that makes the rest of it real. Start a session locally, hand it to a cloud VM, close your laptop, come back to a finished PR. This is the part that shifts "AI coding assistant" into something closer to "asynchronous engineering team." Whether that's what you want is a different question.

Composer 2 is Cursor's in-house coding model -- runs locally, no per-use cloud charges, higher usage limits. There's a story here about how they disclosed it (or didn't) that we'll get to.

/multitask, shipped in 3.2 on April 24, breaks a large task into chunks and fires them at a fleet of subagents simultaneously. Cross-repo too. This is where the "agent execution runtime" framing starts to feel accurate rather than just aspirational -- and where Cursor starts looking less like an IDE and more like a CI/CD layer you interact with conversationally.

The MCP Marketplace rounds it out. Cursor is quietly becoming a platform. That matters for lock-in reasons as much as feature reasons.

The philosophy shift, and why half the community isn't happy about it

Cursor's co-founders framed this release around "three eras of software development." Era one: you edit files manually. Era two: agents write most of the code while you direct. Era three: fleets of agents ship improvements autonomously while you review.

They're betting we're in era two right now, and building toward three. The interface reflects that.

The top HN comment the day it launched:

"I wish they'd keep the old philosophy of letting the developer drive and the agent assist. I still want to code, not vibe my way through tickets."

A Cursor engineer responded within minutes -- the IDE still exists, the Agents Window is a separate surface, you can have both open simultaneously or ignore agents entirely. Both things are true. But they're not actually disagreeing about features, they're disagreeing about what the job is supposed to be.

That disconnect is the real story here. Not what features shipped, but what Anysphere believes about where software development is heading, and whether developers agree with that framing. A lot of people who use Cursor are there precisely because they want to stay close to the code. The agent-first pitch reads to them as the tool choosing a direction they didn't ask for.

And they're not wrong to push back -- because what Cursor 3 is really proposing isn't more automation on top of your existing job. It's a different job. Writing code and managing outputs from multiple semi-autonomous systems running in parallel are not the same skill. They use different mental models, different review instincts, different debugging approaches. One is authorship. The other is closer to code review at scale with partial information and no single source of truth.

"Orchestrating a fleet" is not a more productive version of "writing systems software." It's a different mode of working. Cursor 3 has a strong opinion on which mode matters more. You might not share it.

The Composer 2 situation

Cursor didn't disclose what model Composer 2 is built on in their initial announcement. An external developer spotted the identifier kimi-k2p5-rl-0317-s515-fast in system responses and traced it back to Kimi K2.5 from Moonshot AI.

Co-founder Aman Sanger called the omission "a miss" and said they'd disclose the base model upfront for future releases. Moonshot AI confirmed it was an authorized commercial partnership through Fireworks AI. About 75% of Composer 2's total compute came from Cursor's own continued pre-training and reinforcement learning on top of the base -- so it's not just a reskin. But the lack of upfront disclosure did not go over great.

On benchmarks: Composer 2 scores 61.7 on Terminal-Bench 2.0 vs Opus 4.6's 58.0. GPT-5.4 sits at 75.1. Google's Antigravity scores 76.2 on SWE-bench Verified. Cursor is competitive but not leading -- which matters more now that they have an in-house model to defend.

The upside is real though. Local execution, no per-use cloud charges, higher usage limits than routing everything to frontier models. For people who were burning through Claude credits in Cursor, it's a meaningful cost relief for standard tasks.

The cost thing is not a footnote

This is the part most people are going to ignore until they get billed.

Cursor's pricing page lists four tiers: Free, Pro at $20/month, Pro+ at $60, Ultra at $200. Those numbers look fine. The issue is that cloud agents aren't metered the way the pricing page implies.

Early adopters on Hacker News reported spending $2,000+ running cloud agents. Not $2,000/month. Two days. One user switched from $1,800/month on Cursor to roughly $200/month on Claude Code, calling it "WAY better value for money." Another reported "$2k a week with premium models" before switching.

The per-minute VM charges for cloud execution are not disclosed on the pricing page. You find out when the bill arrives.

Compare: Claude Code Max runs at a flat $100-200/month with parallel execution via worktrees. If you're doing heavy agentic work, the math is not subtle.

Local agents via Composer 2 have no per-use charges -- that's the intended use case for standard tasks. Cloud agents are where the real power is (overnight runs, mobile-triggered tasks, multi-repo parallelism) and that's also where the costs are opaque. Track your spend for a full week before assuming the listed tier is what you'll actually pay.

The feature is real. The value is real for the right workload. But the pricing model is designed around the demos, not around what happens when you actually run it for a week.

Where it sits in the landscape

The AI IDE space consolidated fast this year. Three distinct philosophies, worth knowing the difference:

Cursor 3 -- IDE-native, GUI-first, now agent-first. If you want visual tooling, parallel agents with a management UI, and the ability to annotate a browser and tell an agent to fix that exact thing, Cursor is where that workflow is most mature. Cost: $20/month listed, variable in practice.

Claude Code -- terminal-native, stays out of your way. No GUI, runs in your existing terminal, integrates with whatever editor you already use. Still ahead on fully autonomous agentic work for people who don't want an IDE wrapper around everything. Flat $100-200/month at Max tier.

Google Antigravity -- the wildcard. Built from scratch (not a VS Code fork) by the team Google acquired for $2.4B, shipped free in November 2025, 76.2% on SWE-bench Verified which is one of the highest published numbers for a coding agent right now. Free. Worth a weekend if you haven't looked.

ForgeCode -- open source, terminal-based, bring your own API keys, topped Terminal-Bench 2.0 at 81.8%. Their blog post about hitting number one is titled "benchmarks don't matter," which is either a good sign or a bad sign depending on your priors. Worth a weekend too.

What this actually means

The "you're the architect, agents are the builders" framing is going to keep coming up. Cursor 3 is the most explicit statement of that direction from a major tool yet, but it's not the only one heading there. Antigravity, Claude Code, Codex -- they're all converging on the same mental model.

The question worth sitting with if you build systems software, CLI tools, or anything requiring you to stay close to the metal: does agent orchestration actually help that workflow, or does it mostly help the "generate a CRUD app from a prompt" workflow and kind of work for everything else as a side effect?

My honest read: parallel agents are genuinely useful for tasks with clear boundaries and independent surface area. Spin up three agents on three separate features, review the PRs, merge what works. That's real. For deep systems work where the whole point is that you're carefully reasoning through one gnarly problem -- handing that to a fleet isn't faster, it's noisier. You spend the time you saved writing code on reviewing agent output that's plausible-looking but wrong in ways that only show up later.

It'll get there. The benchmarks are moving fast enough that "this doesn't work for systems work" is probably a 2026 statement, not a permanent one. But right now, parallel agents are mostly useful for bounded tasks where correctness is verifiable and the problem decomposes cleanly. That's a real category of work. It's just not all the work.

The more interesting shift is the one underneath all of this. Cursor 3 isn't really about parallel agents as a feature. It's about what the tooling assumes the job looks like. And if the tools all converge on "you manage agents, you don't write code," the developers who push back aren't being resistant to change -- they're noticing that nobody asked whether that's actually the job they signed up for.

Where to dig in

Cmd+Shift+P -> Agents Window -- try the parallel agents UI in Cursor 3
cursor.com/changelog -- they ship fast, worth following
ForgeCode on GitHub -- bring-your-own-keys, open source, worth a look if you're skeptical of the closed tooling direction
Google Antigravity -- free, agent-first, no VS Code fork baggage

What's your current setup? Cursor, Claude Code, something else entirely? And if you've actually run parallel agents in production -- how'd the costs shake out? Drop it in the comments, genuinely curious where people land on this.

Yann LeCun thinks the whole industry is building the wrong thing, and now he has $1B to prove it

1p — Wed, 29 Apr 2026 18:28:10 +0000

LeCun left Meta, started AMI Labs, and is betting world models beat LLMs for real AI. Here's what that actually means, what the research shows, and why it matters for where AI tooling goes next.

Quick context if you haven't been following: Yann LeCun is one of the three "godfathers of deep learning" (the Turing Award crew alongside Hinton and Bengio), spent 12 years running Meta's AI research lab FAIR, and has been publicly, loudly skeptical of LLMs basically the entire time they became the dominant paradigm. Think of him as the guy in your Discord who keeps saying "yeah but have you actually read the architecture paper" -- except he's usually right, and now he's raised a billion dollars.

In November 2025 he left Meta. By March 2026, his new lab AMI Labs (pronounced ah-mee, French for friend, cute) closed a $1.03B seed round at a $3.5B valuation.

Largest seed round in European startup history.

Backers include Bezos Expeditions, NVIDIA, Samsung, Toyota Ventures, and Tim Berners-Lee personally. That's not a hype round. That's serious people making a serious bet.

The bet being: world models are the actual path to useful AI, and LLMs are a dead end for anything involving the physical world.

Let me break that down for you.

The problem with autocomplete at scale

LLMs do one thing: predict what token comes next, over and over, trained on enough text that the predictions become eerily good. That's genuinely impressive engineering. But there's a structural ceiling.

Here's a concrete way to see it. If you ask GPT-anything to help you write a Rust CLI tool, it does pretty well. Ask it to debug a memory layout issue where the problem only shows up under a specific CPU cache behavior and it starts hallucinating plausible-sounding nonsense. Not because it's dumb, but because it never learned the underlying model of how memory and CPUs actually interact. It learned the language people use to talk about those things. Different thing.

LeCun's framing: LLMs are trained on "the dried crust of human knowledge", text written after the thinking was done. They don't have access to the reasoning process, the failed experiments, the physical intuition that produced that text. They get the output, not the computation.

The symptoms we all know: hallucinations, no real planning, zero common sense about physical cause and effect. A model that can write a paper about gravity can't predict that a ball will fall if you drop it, not from first principles anyway.

What's a world model, actually

The term gets thrown around loosely so let me be precise about what LeCun means.

A world model is an internal simulation an agent builds of how its environment behaves, not just pattern-matching on surface features, but learning the rules that generate those patterns. Babies do this before they can talk. You've got a world model running right now: you know a coffee cup will fall if it's too close to the table edge, you know roughly how far you can lean a chair before it tips, you know that if someone looks over your shoulder they can read your screen. None of that came from reading text.

The goal is AI that builds that same kind of model from observation — watching video, interacting with environments — and can then reason forward from it. "If I do X, Y will probably happen, and that means Z becomes possible."

This is a pretty different problem from next-token prediction.

The architecture: JEPA

LeCun's technical answer is called JEPA (Joint Embedding Predictive Architecture), which he first proposed in a 2022 paper while still at Meta.

The core idea is this: instead of predicting the raw pixels of what a future video frame will look like (which is nearly impossible, too much irrelevant detail), JEPA learns an abstract representation of what's happening and makes predictions in that space.

Imagine watching someone reach toward a coffee mug. You don't mentally render every photon bounce in 4K. You just know: "they're picking that up." JEPA learns that level of abstraction, ignoring unpredictable low-level noise, keeping the meaningful structure.

The technical term is it operates in latent space rather than pixel space or token space. You're predicting compressed representations of reality, not reality itself.

And it's not generative. It's not trying to generate the next frame. It's learning the underlying dynamics, more like a physics engine than a video renderer.

Results that are already out

This isn't pure theory waiting on a 10-year timeline. There's published research.

V-JEPA (the video version, released by Meta's FAIR team) was trained on internet video and showed solid performance on motion understanding tasks. Then came V-JEPA 2 in June 2025, a 1.2B parameter model trained on over a million hours of video. The wild part: it was fine-tuned on just ~62 hours of real robot interaction data and could do zero-shot robotic planning, outperforming Nvidia's Cosmos by up to 30x in speed.

Zero-shot. Meaning the robot had never seen those specific objects or environments during training. It generalized from its world model.

On the Something-Something v2 benchmark for motion understanding it hit 77.3% top-1 accuracy, and on Epic-Kitchens-100 for human action anticipation it reached 39.7 recall-at-5, beating previous task-specific models. These are hard benchmarks. Task-specific models train specifically for these tasks and still got beaten by a general world model.

Then VL-JEPA (vision-language, late 2025), with just 1.6B parameters, matched or exceeded larger generative VLMs like InstructBLIP and QwenVL on benchmarks like GQA and POPE, using 50% fewer trainable parameters.

Half the parameters. Same or better results. That's not an incremental improvement. That's a signal the architecture is doing something smarter, not just brute-forcing scale.

What happened when LeCun left Meta

The split from Meta is interesting because it's not dramatic, no blowup, no public drama. LeCun told MIT Tech Review he "kind of hated being a director" and that he disagreed with some of Zuckerberg's calls (letting the robotics group go at FAIR was his specific example). Meta doubled down on LLMs and scaling Llama. LeCun thought that was the wrong mountain.

So he left and started AMI in Paris. Pronounced it's also the abbreviation for Advanced Machine Intelligence — the exact research program he was running at FAIR. He's just continuing it without the corporate overhead. :)

The funding round brought in some interesting names beyond the usual VC suspects: co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, with individuals including Tim and Rosemary Berners-Lee, Jim Breyer, Mark Cuban, and Eric Schmidt. Also NVIDIA and Samsung on the strategic side. These are people who understand what a long-bet fundamental research play looks like.

First disclosed partner is Nabla, a healthcare AI company, specifically because hallucinations in medical AI are a genuine patient safety problem, and world models are being explored as the fix.

How the community actually reacted

Honestly, split. And that's kind of healthy.

On the skeptical side: Elon Musk posted that LeCun "thinks if he can't do it, no one can." Figure's Brett Adcock told him to "get his hands dirty" (Figure makes humanoid robots using end-to-end learned approaches LeCun thinks are fundamentally limited). Some Hacker News comments were blunt, one called the whole wave "science experiments rewarded with VC money."

LeCun's reply to Musk was basically: "I know I can do it and I know how to do it. Just not with the techniques everyone is currently betting on."

On the believer side: Goldman Sachs published a report calling the world model "the missing link" in AI, arguing that solving it represents the next decisive leap in artificial intelligence. Fei-Fei Li launched World Labs around spatial intelligence (closely related). DeepMind's Demis Hassabis has said he thinks language is limited for robotics and is working on world models through Genie and SIMA. Even if the labs won't say "LeCun was right," they're quietly working on the same problem.

Why this is relevant if you build devtools or systems software

OK so here's where I'd normally make the leap to "this changes everything for developers" — but let me be more specific than that.

If you're building CLI tools, systems software, Rust stuff, anything in the devtools space: the near-term impact of world models is mostly in what AI assistants for developers can eventually become.

Right now, an AI coding assistant is essentially autocomplete plus a very large lookup table. It works surprisingly well because a lot of coding is pattern-matching. But the failure modes are specific: it doesn't model your system — your runtime, your memory layout, your dependency graph behavior under load. It models the syntax of talking about those things.

A world model-based assistant could potentially build an actual simulation of your codebase — understand that this function causes that behavior, that this allocation pattern leads to this cache behavior, that this interface contract breaks under these conditions. Not by having read a million Stack Overflow answers about it. By actually modeling the system.

That's still a few years out from AMI. But it's the direction.

More concretely right now: the robotics and industrial automation track is moving fast. AMI Labs is targeting healthcare, robotics, wearables, and industrial automation as its first commercial applications. World models for physical systems — factory automation, autonomous vehicles, drones — is where the early deployment is happening. V-JEPA 2 doing zero-shot robot planning is the proof of concept.

One honest caveat

AMI's CEO was direct about this: it's "not your typical applied AI startup that can release a product in three months." This is long-horizon fundamental research. Think years, not quarters.

LeCun himself said it plainly: we're going to get AI systems with human-level intelligence, but not built on LLMs, and "not next year or two years from now." There are conceptual breakthroughs still needed.

So this isn't "LLMs are dead, pivot your stack." LLMs are the best general-purpose AI tool available today. But the research direction is clearly shifting, and the architecture questions being asked now will shape what AI-powered devtools look like in 5 years.

It's worth understanding what JEPA is before it's everywhere. Kind of like understanding attention mechanisms before transformers became unavoidable.

Where to go deeper

If you want to actually read the work rather than just follow the funding drama:

LeCun's 2022 position paper: "A Path Towards Autonomous Machine Intelligence" (arxiv) — this is the foundational thing
V-JEPA 2 paper (arXiv:2506.09985) — concrete results on physical reasoning
VL-JEPA paper (arXiv:2512.10942) — the vision-language results
AMI Labs site: amilabs.xyz — pretty sparse still but confirms the research direction

Are you watching the world model space, or does it feel too far out to care about right now? Curious what people building real systems software think about where AI tooling is headed — drop a comment, always looking to explore new perspectives.

And if you're already playing with JEPA or any of the open-source research outputs, I'd genuinely love to know what you've built with it.