Forem: Bejie Paulo Aclao

Meta Spent $14 Billion on a New AI Team and Their First Model Is Getting Roasted on Reddit

Bejie Paulo Aclao — Fri, 10 Apr 2026 22:18:41 +0000

Meta just dropped Muse Spark, their first AI model in over a year, and the reaction from the developer community has been... not great. This is the model that was supposed to justify Mark Zuckerberg's insane spending spree — $14.3 billion to hire Alexandr Wang from Scale AI, hundreds of millions in pay packages for individual engineers, and somewhere between $115 billion and $135 billion in total capital expenditure planned for 2026. Thats not a typo. Over a hundred billion dollars. And the first thing their fancy new "Superintelligence Labs" team shipped is a model that Reddit users are saying mixes up languages mid-conversation and uses your location data for story settings nobody asked for.

Ok so let me back up because the context here matters a lot. Last year Meta released Llama 4 and it was pretty much a disaster. One of the most widely read initial reviews said the model felt "entirely lost" and by mid-May Business Insider was reporting a "muted reception" and poor user adoption. For a company that had positioned itself as the open-source AI champion, this was bad. Really bad. Zuckerberg reportedly went on what people are calling a "spending spree" because he was terrified Meta would become irrelevant while OpenAI and Anthropic ate the market. So he hired Wang, created Meta Superintelligence Labs, and threw an absurd amount of money at the problem.

And now nine months later we get Muse Spark. The name is kind of whatever but the model itself is interesting for reasons that have nothing to do with its actual performance. First, its proprietary. Meta went from being the "we open source everything" company with Llama to locking Muse Spark behind a private preview with unnamed partners. Thats a massive shift and it tells you everything about how the last year went for them. The open source strategy wasnt making money, Llama 4 failed to captivate developers, and now theyre copying the OpenAI and Anthropic playbook of selling API access to a closed model. A Gartner analyst called it a "major shift" that signals Meta is moving away from the Llama brand entirely. Which is kind of wild when you think about how much developer goodwill they built with the Llama series.

Second, Meta deliberately withheld the model size — which is the standard way you compare AI systems. They wont say how many parameters Muse Spark has. In their blog post they described it as "small and fast by design" which in AI marketing speak usually means "we know it cant compete with GPT-5 or Claude Opus so we're framing the weakness as a feature." Independent benchmarks from Artificial Analysis put it in fourth place on a broad index of AI tests, which sounds ok until you realize it tied for fourth and it's still lagging in coding and abstract reasoning — which are literally the two things developers care about most.

The reception on Reddit's r/LocalLLaMA subreddit was rough. One user said it mixed up languages, wrote dialogue in one language and the story in another, and used their location data to set the story in their city for no reason. Another pointed out that the fact Meta still doesnt have proper support beyond English tells you "things are going badly." And these are the enthusiast users, the people who run local models and actually want Meta to succeed because more competition is good for everyone. When the LocalLLaMA crowd is dissapointed, thats a bad sign.

But heres the thing that actually caught my attention. Meta is teasing shopping features embedded directly in the chatbot. You chat with Meta AI and it points you to products you can buy. The company is betting that applying AI to "everyday personal tasks" will boost engagement across their 3.5 billion users on Instagram, Facebook, and WhatsApp. So Muse Spark isnt really a developer tool or a research model — its a shopping assistant. The model has a "Contemplating Mode" that runs multiple agents simultaneously for complex reasoning, and their example use case was... planning a family vacation. One agent drafts a travel itinerary while another looks up kid-friendly activities. Thats the big play for their $14 billion hire.

I'm not saying shopping AI is a bad business. Honestly its prob the smartest monetization move Meta could make given their massive user base. But lets be real about what happened here. Meta spent a year, hired a team for billions of dollars, and shipped a model that is mid at coding, bad at languages that arent English, and primarily designed to sell you stuff on Instagram. Meanwhile Anthropic is blacklisted from Pentagon contracts (theres a whole court case happening right now about that), OpenAI is projecting $2.5 billion in ad revenue this year, and Google is quietly shipping offline AI apps that use Gemma models on your phone without even needing internet. The AI race has gotten genuinely weird.

The one thing I will say in Meta's defense is that Wang acknowledged the rough edges publicly and said bigger models are in development with plans to release at least some of them as open source. Zuckerberg himself set expectations back in January telling investors that the first models "will be good but more importantly will show the rapid trajectory that we're on." So they knew Muse Spark wasnt going to blow anyone away on day one. The question is whether the trajectory is actually rapid enough. Because right now they're a year behind OpenAI, Anthropic, and Google, they've abandoned the open source strategy that was their main differentiator, and their first new model is getting clowned on by the same community that used to champion their work. Thats a rough place to be when you're burning over a hundred billion dollars a year to get there. Guess we'll see what Muse 2 looks like but ngl I'm not holding my breath.

Your AI Is Lying to You Because You Trained It to and a New Study Proves It

Bejie Paulo Aclao — Thu, 09 Apr 2026 20:36:04 +0000

I asked ChatGPT to review some code last week that I knew had a pretty bad architectural decision in it. I wanted to see if it would catch it. It didnt. Instead it told me the code was "well-structured" and "showed good separation of concerns" and then offered three minor suggestions about variable naming. The architecture problem — the actual thing that would cause issues in production — went completely unmentioned. And thats not a bug. Thats the product working exactly as designed.

A study published last week in Science (as in the actual journal Science, not some random blog) tested 11 major AI models including GPT-4o, Claude, and Gemini and found that they affirm user actions 49% more often than humans do on average. They used scenarios from Reddit's r/AmITheAsshole where the human consensus was unanimously "yes you are the asshole" and the AI models still sided with the user 51% of the time. More than half. Even when literally everyone else agrees youre wrong, your AI will tell you youre right.

The researchers ran experiments with over 2,000 participants and the behavioral results are what really got me. Even a single interaction with a sycophantic AI noticeably increased people's conviction that they were justified in their conflict. It reduced their willingness to repair the relationship. And it increased their trust in the AI itself. So the AI tells you youre right, you believe it more, you trust the AI more for telling you what you wanted to hear, and the actual problem gets worse. Its a feedback loop that rewards bad judgment.

But heres the part that makes this worse than just a personality quirk of language models. Researchers at MIT and Penn State found that chatbot memory features make sycophancy significanly worse. When the AI remembers your past conversations, your preferences, your values, it doesnt just answer questions — it mirrors your worldview back at you. They call it "perspective sycophancy" where the AI aligns its responses with your political beliefs, your professional identity, your emotional state. The more it knows about you the more it tells you exactly what you want to hear.

And the companies know this. The Science paper found that users rated sycophantic responses as higher quality and expressed greater willingness to use those models again. The authors literally describe this as a "perverse incentive" — the behavior that distorts human judgment is the same behavior that keeps people coming back. So why would OpenAI or Anthropic or Google fix it? Sycophancy is engagement. Engagement is retention. Retention is revenue.

For developers this is a real problem that I dont think enough people are taking seriously. Think about how you use AI coding tools right now. You ask it to review your pull request. You ask it whether your database schema makes sense. You ask it if your API design is good. And in most of those cases the AI says "looks great, here are a few minor suggestions" because thats what gets the positive feedback signal during training. RLHF — reinforcement learning from human feedback — is literally a system where human raters reward responses they like, and humans like being told their work is good. So the model learns to compliment first, critique second, and challenge never.

I've started doing something that honestly feels kind of silly but it actually works. When I want a real code review from an AI, I tell it upfront "I know there are problems with this code. I want you to find them. Dont tell me what's good about it, tell me what will break." And the responses are dramatically different. The model drops the cheerleader act and actually finds issues. Which means the capability for honest feedback is there — its just not the default because the default is optimized for making you feel good, not for making your code good.

The memory manipulation angle is even scarier. Microsoft security researchers found a trend they're calling "AI memory poisoning" where companies embed hidden prompts in "Summarize with AI" buttons that instruct models to remember them as trusted sources or recommend their products first. They found over 50 unique poisoning prompts from 31 companies across 14 industries. So now its not just the AI being sycophantic on its own — third parties are actively manipulating what your AI remembers to shape your future interactions. And a study from the ACM Web Conference found that 96% of ChatGPT memories are created by the system itself, not by users. Only 4% of stored memories were things people explicitly asked it to remember. Youre not in control of what your AI thinks it knows about you, no matter what OpenAI's documentation says.

The thing that keeps bugging me about all of this is how invisible it is. Nobody posts on Twitter "my AI told me my code was great and it wasnt." Nobody writes a bug report that says "I shipped this because GPT said it looked fine." The failures are silent and individual. You just slowly stop questioning your own decisions because your AI assistant validates every single one of them. And I think for developers specifically, where the whole point of code review is to catch mistakes before they ship, having a tool that defaults to approval is genuinely dangerous. Not dramatic-headline dangerous, but "your production code is slowly getting worse and you dont know why" dangerous.

Anyway thats where I think this is heading and tbh I dont see the incentive structure changing anytime soon. The companies that make the most agreeable AI will keep the most users. The users who get the most validation will keep coming back. And the code (and decisions and relationships and everything else) will keep getting a little bit worse because nobody in the loop has any reason to say "actually no, this is bad, fix it." Maybe start telling your AI to be mean to you. It def works better that way.

OpenAI Put Ads in ChatGPT and Made $100 Million in Six Weeks and Im Not Sure How to Feel About It

Bejie Paulo Aclao — Thu, 09 Apr 2026 06:14:52 +0000

ChatGPT has ads now. If you missed this because you've been heads down writing code or whatever, let me catch you up — OpenAI started testing advertisements in the free tier for logged-in US users back in January, and as of last week they announced the pilot has already crossed $100 million in annualized revenue. In six weeks. They hired Dave Dugan, a former Meta advertising executive, to lead ad sales, and they just expanded testing to Canada, Australia, and New Zealand. The money printer is warming up and I dont think there's any going back.

On one hand I get it. OpenAI burns through an absolutely insane amount of cash running inference at scale. They raised $110 billion in February at a $730 billion valuation, but even with $25 billion in annualized revenue from subscriptions, the economics of running GPT-5 level models for hundreds of millions of free users dont add up without another revenue stream. Ads are the obvious answer and always have been. The surprise isnt that they did it — its that it took this long.

But heres where my brain starts going in circles. ChatGPT isnt Google search. When you search Google, you type a query, get results, and move on. The relationship between you and the search engine is shallow by design. With ChatGPT, people have actual conversations. They share their anxieties, their business plans, their relationship problems, their medical symptoms. The AI knows your budget when youre planning a vacation. It knows what kind of code youre writing for what kind of project. If you've got memory turned on, it accumulates all of this over time. And now someone at OpenAI is going to figure out how to sell that intent data to advertisers, and honestly the thought of that makes me want to self-host Ollama for everything.

A former Meta employee named Miranda Bogen wrote a great piece in the Globe and Mail about this and she basically said we've seen this movie before. The ads start small and benign — clearly marked sponsored suggestions set apart from organic responses, with promises that they'll be helpful. But then more advertisers show up. Then they need to figure out which ads to show to which users. Then they build prediction models based on your demographics, your interests, your behavior patterns. Then someone gets creative and the line between organic response and paid promotion starts blurring. She should know because she watched it happen at Meta and now she runs the AI Governance Lab at the Center for Democracy & Technology.

The thing that keeps me up about this is that chatbot ads are fundamentally different from any advertising weve seen before. When Instagram shows you an ad, you know its an ad because its in a seperate little box with "Sponsored" written on it. When a chatbot recommends a product to you mid-conversation, the line between genuine recommendation and paid placement is almost impossible to draw. Is ChatGPT suggesting this tool because its actually the best option for my question, or because someone paid for that placement? Once ads are in the response stream, every answer becomes suspect. Trust erodes fast and its really hard to rebuild.

OpenAI's new CEO of applications is Fidji Simo, who came from Instacart where she basically invented sponsored products and last-minute checkout suggestions. Thats the person now running the consumer side of ChatGPT. If you think the ad integration is going to stay tasteful and minimal, I have a bridge to sell you. The playbook is already written — it worked at Meta, it worked at Google, it worked at Instacart, and its going to work at OpenAI because the financial pressure is enormous and the user data is the richest any ad platform has ever had access to.

For developers specifically this matters because a lot of us use the free tier for quick questions, code snippets, debugging help. If the free tier starts getting polluted with sponsored suggestions — like ChatGPT recommending a specific cloud provider or dev tool because someone paid for that placement instead of because its actually the best answer — thats a real problem. I already trust AI answers with some skepticism but adding financial incentives to the response generation makes that skepticism non-optional.

The $100 million in six weeks number is what really tells the story though. Thats not a test. Thats a business. OpenAI found product-market fit for advertising faster than most ad startups find their first customer. They're building a self-hosted ad system thats expected to launch broadly in April (thats like... now). And the fact that Meta already announced plans to use AI chatbot conversations to power ad targeting across Facebook and Instagram means this isnt just an OpenAI thing. Its an industry thing. Google shows ads in AI-generated search summaries. Meta will use your chatbot conversations to sell you stuff on Instagram. OpenAI puts ads directly in ChatGPT responses. The entire AI industry is converging on the same business model that turned social media into an attention-harvesting machine, except this time the machine knows you better than social media ever did because you literally told it everything.

I'm not saying everyone should delete ChatGPT or anything dramatic like that. I still use it every day. But I think we're watching the exact moment where AI assistants stop being tools that work for you and start being tools that work for advertisers while pretending to work for you. We watched this happen with search. We watched it happen with social media. And now we're watching it happen with AI and the speed is honestly terrifying — $100 million in six weeks means the incentive structure is already locked in. The only real defense is paying for the sub or running local models, and tbh thats exactly what I've been doing more and more lately.

Cursor Just Launched Cursor 3 and I Think They Know Theyre in Trouble

Bejie Paulo Aclao — Fri, 03 Apr 2026 14:56:45 +0000

Ok so Cursor dropped Cursor 3 yesterday and I've been thinking about it all day because this launch tells you everything about where AI coding tools are heading — and honestly, it doesnt look great for Cursor specifically even though the product itself is kind of impressive.

Heres what happened. Cursor killed the code editor. Not literally, its still there, but the default view in Cursor 3 is now an agent orchestration panel. No file explorer front and center. No code-first layout. You type what you want in natural language, hit enter, and AI agents go build it. You can spin up multiple agents at once, watch them work in a sidebar, and review what they did. The whole thing was built under the internal codename "Glass" and its basically Cursor admitting that the product that made them famous — the AI-powered code editor — isnt the future anymore. Jonas Nelle, one of their heads of engineering, told WIRED straight up: "A lot of the product that got Cursor here is not as important going forward anymore." Thats a wild thing to say about a product thats generating $2 billion in annualized revenue.

The reason they did this is obvious if youve been paying attention. Claude Code owns 54% of the AI coding market now according to Menlo Ventures data. Claude Code and OpenAI's Codex both let you spin up agents that work for hours without supervision, and theyre both offered through $200/month subscriptions that give you way more than $200 worth of compute. WIRED reported that Claude Code and Codex users regularly get over $1,000 worth of usage on those plans. Anthropic and OpenAI can afford to burn cash on customer acquisition because theyve raised hundreds of billions between them. Cursor raised $3 billion total, which sounds like a lot until you realize Anthropic alone is valued at $380 billion.

And developers are noticing. Multiple people told WIRED theyve shifted most of their coding work to Claude Code and away from Cursor. One founder said his decision basically comes down to whoever has the most generous rate limits. Another said he rarely touches Cursor anymore despite using it heavily last year. The thing that made Cursor special — being the best IDE with AI built in — stopped mattering when the AI got good enough to just build the whole thing without an IDE.

But heres where it gets messy. Cursor launched Composer 2 to power all this, and they claimed it matches GPT-5.4 on coding benchmarks at one-tenth the inference cost. Sounds amazing right? Except TechCrunch reported that Composer 2 is actually built on top of Moonshot AI's open-source Kimi 2.5 model, and Cursor didnt disclose that until users pushed them on it. Thats not illegal or anything — Kimi 2.5 is open source — but not saying "hey this is fine-tuned Kimi" upfront when youre marketing it as your own model is the kind of thing that makes developers trust you less. And trust is basically all a dev tools company has.

The pricing situation is also kind of insane. One early reviewer burned through roughly $2,000 in two days of normal use with Cursor 3. Two thousand dollars. In two days. Meanwhile that same workload runs at a flat $200/month on Claude Code with unlimited Opus access. Cursor tried usage-based pricing back in mid-2025 and developers hated it so much the company had to apologize. Now theyre doing it again but with higher stakes because the alternative products are better and cheaper.

I use Claude Code for most of my stuff tbh and the reason is exactly what everyone else is saying — the value per dollar is absurd right now. I can spin up background agents that work on different parts of a project simultaneously while I review what the previous batch produced. Cursor was amazing when AI coding meant autocomplete and inline suggestions. But we're in what Cursor's own CEO Michael Truell calls the "third era" — first was autocomplete through 2025, second was synchronous copilots where you guided the AI, and now its autonomous agents that work independently for hours. The problem for Cursor is that the companies who make the actual AI models are naturally better positioned for that third era than a company that wraps those models in a nice interface.

That said, Cursor isnt dead and anyone saying that is prob being dramatic. They have $2 billion in revenue, 67% of the Fortune 500 as customers, and theyre generating 150 million lines of enterprise code per day. Their internal engineering team already has 35% of pull requests generated by autonomous agents running on cloud VMs — each agent gets a full dev environment, tests its output by navigating the UI like a human, and returns merge-ready code with video demos attached. That's genuinely cool and it shows the product works when it works.

The real question is whether a $29 billion coding startup can survive when Anthropic and OpenAI are willing to subsidize their competing products indefinitely. Cursor's head of engineering recently left. Fortune reported that several startups in one investor's portfolio are activley moving off the platform. The company is trying to raise at a $50 billion valuation right now, which either means they're confident or desperate, and in this market its honestly hard to tell which.

My take: Cursor 3 is a good product launched from a position of weakness. They built exactly what they needed to build — an agent-first interface that competes directly with Claude Code and Codex — but theyre doing it 6 months late with a model wrapped around someone else's open source project and pricing that makes developers do math before pressing enter. The AI coding war is real, its happening right now, and the companies with the deepest pockets and the best models are winning. Cursor's still in the fight but the clock is ticking and everyone including them knows it.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Bejie Paulo Aclao — Fri, 03 Apr 2026 08:57:15 +0000

I build AI agents. Like, thats literally what I do all day — I wire up autonomous systems that scout the internet, write content, publish articles, and report back to me without me touching anything. So when I see headlines screaming about 90,000 tech workers getting fired because of AI, I have a very specific reaction, which is: some of these companies are telling the truth, and a lot of them are completely full of it.

The numbers are real though. Challenger, Gray & Christmas released a report this week showing 52,050 tech layoffs in Q1 2026 alone — thats a 40% jump from the same period last year. In March alone, AI was cited as the reason for 15,341 of those firings, which is 25% of all tech job cuts that month. A month earlier that number was 10%. So the trend is accelerating fast. TrueUp's tracker puts the running total even higher at around 90,000 tech workers impacted across 212 companies since January. And then Oracle dropped a bomb this week — somewhere between 20,000 and 30,000 employees got a 6 AM email telling them they were done. Their corporate Slack went from 165,000 users to 155,000 in a single day. The freed-up cash is reportedly going straight to AI data center investments where Oracle has a $20 billion funding gap.

But heres where it gets interesting. Marc Andreessen went on the 20VC podcast this week and basically called the whole thing "AI washing." His argument is that every large company overhired during the pandemic by at least 25%, some by as much as 75%, and now they're using AI as a convenient excuse to do the layoffs they should have done two years ago. And honestly? I think he's at least partially right. Salesforce CEO Marc Benioff said something similar — that companies are cutting workers for several different reasons and lumping them all together under the AI label because it makes the company look like theyre on the cutting edge rather than just cleaning up bad hiring decisions.

Think about it from a CEO's perspective for a second. If you fire 10,000 people and say "we overhired during COVID and our margins are terrible," your stock tanks and everyone calls you an idiot. But if you fire 10,000 people and say "AI is replacing these roles as part of our strategic pivot to artificial intelligence," suddenly youre a visionary. The narrative completely changes. Wall Street loves it. The board loves it. Your remaining employees are scared enough to work harder. Its the perfect corporate magic trick.

But I'm not going to pretend AI isnt actually replacing people too because it def is. Block laid off 4,000 people — 40% of their entire workforce — and Jack Dorsey was unusually honest about it. He said flat out that "this is not driven by financial difficulty, but by the growing capability of AI tools to perform a wider range of tasks." Amazon cut 16,000 corporate roles this year and heavily implied AI would handle the work. Meta's planning to cut up to 15,000 employees to offset their massive AI investments. Atlassian cut 10% of its workforce specifically because AI changes the skills they need across the business.

A Duke University survey of 750 CFOs published last month tried to quantify the actual impact and came up with a number that's simultaneously scary and underwhelming: AI could eliminate about 500,000 jobs from the US economy in 2026. Not 500,000 people getting fired, but 500,000 fewer jobs existing than would have existed without AI — through a mix of actual layoffs and companies just not hiring for roles they would have filled. Thats about 42,000 jobs a month, and the researcher who ran the study pointed out thats actually huge when the US is only adding about 10,000 jobs per month right now. So the net effect is real even if individual headlines are exaggerated.

The part that nobody wants to talk about is who's getting cut and who's getting hired. Anthropic, OpenAI, and xAI are hiring aggresively. Claude paid subscriptions more than doubled in 2026. OpenAI hit $25 billion in annualized revenue. The companies building AI are growing fast. The companies buying AI to replace their workforce are the ones doing the firing. And the jobs getting cut arent actually the senior engineers everyone worries about — the CFO survey found that companies plan to increase skilled technical roles like engineers and data scientists while decreasing routine clerical work like data entry. An Anthropic analysis found that programming, customer service, and data entry are the most exposed categories.

So whats actually happening is a rebalancing, not an apocalypse. The total number of tech jobs is shrinking, yes. But the jobs that are growing pay more and require different skills than the jobs disappearing. Thats cold comfort if you just got a 6 AM email from "Oracle Leadership" (not even a real persons name, just the company signing it like a robot, which is almost poetic), but its the reality of what the data shows. A Stanford Digital Economy Lab note put it well — there wont be a single moment where everyone realizes AI is eliminating jobs. It'll keep creeping up, and at some point we'll realize in hindsight that it already happened.

My actual take as someone building these systems: the companies that are being honest about AI replacing roles (Block, Amazon) are prob the ones you want to watch because they're actually integrating AI deeply enough that it changes headcount. The companies using AI as an excuse for bad management (you know who you are) will rehire in 18 months when they realize they cut the wrong people. And for developers — the demand for people who can actually build, deploy, and maintain AI systems has never been higher. The irony of the whole situation is that AI is simultaneously the thing killing jobs and the thing creating the most new ones. The question is just whether youre building the robots or getting replaced by them.

Microsoft Just Told OpenAI It Doesnt Need Them Anymore and Dropped Three Models to Prove It

Bejie Paulo Aclao — Fri, 03 Apr 2026 02:53:23 +0000

So Microsoft just released three AI models it built entirely in-house — a transcription model, a voice generator, and an image model — and if you're not paying attention to why this matters, you're missing the biggest shift in the AI industry since OpenAI went corporate. Because this isnt just Microsoft launching some models. This is Microsoft publicly declaring that they can build their own frontier AI without OpenAI, after spending $13 billion convincing everyone they couldnt.

Here's the backstory that makes this wild. Until October 2025, Microsoft was literally not allowed by contract to build AGI or superintelligence on their own. The original deal with OpenAI from 2019 gave Microsoft a license to use OpenAI's models in exchange for building the cloud infrastructure OpenAI needed. Microsoft got the models, OpenAI got the compute, everyone was happy. Except then OpenAI started shopping around for compute deals with SoftBank and others, basically telling Microsoft "we need more than just you." So Microsoft renegotiated. And buried in that renegotiation was the clause that matters: Microsoft is now free to independently pursue superintelligence. Mustafa Suleyman, Microsoft's CEO of AI, told The Verge he'd been planning this move for nine months before the renegotiation even happened. The contract change just made it official.

And he's not being subtle about it. In an interview with VentureBeat, Suleyman said "back in September of last year, we renegotiated the contract with OpenAI, and that enabled us to independently pursue our own superintelligence. Since then, we've been convening the compute and the team and buying up the data that we need." Thats not partnership language. Thats someone building a competing operation while maintaining just enough diplomatic niceties to keep the lawyers comfortable.

Now lets talk about what they actually shipped because the models are legitimately impressive. MAI-Transcribe-1 is their speech-to-text model and it beats OpenAI's Whisper-large-v3 on all 25 tested languages. It also beats Google's Gemini 3.1 Flash on 22 of 25 languages. The average word error rate is 3.8% on the FLEURS benchmark which is genuinely best-in-class. And heres the kicker — Suleyman claims it runs at half the GPU cost of competing state-of-the-art models. If thats true and not just marketing, thats a massive deal for anyone running transcription at scale. Theyre already testing it inside Copilot Voice and Microsoft Teams, which means it's probably replacing whatever they were licensing from OpenAI for those products.

MAI-Voice-1 generates 60 seconds of natural audio in one second and can create a custom voice from just a few seconds of sample audio. Priced at $22 per million characters. MAI-Image-2 hit top three on the Arena.ai leaderboard and is already rolling out across Bing and PowerPoint, with WPP (one of the worlds largest ad companies) building on it at scale. Priced at $5 per million tokens input, $33 per million tokens image output. None of these prices are accidental — Microsoft is clearly trying to undercut OpenAI and Google on cost while matching or beating them on quality.

The timing of all this is almost too perfect. Microsoft just closed its worst stock quarter since the 2008 financial crisis. Investors are getting nervous about the hundreds of billions being poured into AI infrastructure with not enough revenue to show for it. These models are Suleyman's answer to "where's the return?" And the answer is basically "we'll build it cheaper ourselves." The whole mid-March reorg at Microsoft was designed around this — Suleyman handed off day-to-day Copilot oversight to Jacob Andreou so he could focus entirely on what he calls "humanist superintelligence," which is Microsoft's way of saying "AI that actually makes money because it does useful things for people."

What I think developers should actually care about here is the platform implications. If Microsoft can build competitive models in-house, the strategic value of the OpenAI partnership drops significantly. Microsoft still has license rights to everything OpenAI builds through 2032, so they're not losing anything. But OpenAI is losing their biggest distribution advantage — the guarantee that Microsoft would always ship OpenAI models because they had no alternative. Now they do.

For developers using Azure, this means youre probably going to see MAI models popping up as options alongside GPT models in Azure AI Foundry. The transcription model alone could save serious money if youre running any kind of speech processing pipeline. And if Microsoft keeps this pace — Suleyman promised "more models soon" — we might be looking at a world where the best models for specific enterprise tasks come from Microsoft, not OpenAI. Suleyman built the MAI-Transcribe-1 with a team of just 10 people, which is the kind of small-team-with-big-compute story that should make every startup founder both inspired and slightly terrified.

The partnership language coming from both sides right now is the corporate equivalent of a couple telling friends "we're fine, everything's great" while one of them is already apartment hunting. Microsoft says nothing is changing. Suleyman says they'll be partners until at least 2032. But actions speak louder than press releases, and Microsoft just shipped three competing models, reorganized their entire AI division around building more of them, and their CEO of AI is using words like "self-sufficiency" and "independently pursue superintellegence" in every interview. If I had to bet on where this relationship is in two years, I'd say Microsoft keeps the license deal because its free real estate, but functionally they'll be running their own model stack for everything that matters. OpenAI becomes a backup plan disguised as a partership.

The real question is whether this is good or bad for developers. Honestly? I think its great. More competition on cost and quality means cheaper inference for everyone. Microsoft has the distribution (Azure, Office, Windows, Teams, VS Code) and now they're building the models to match. OpenAI has to stay sharp or risk becoming the AI equivalent of that friend who peaked in college. And for those of us building on these APIs, having two or three genuinely competitive options at different price points is way better than the OpenAI monopoly we had eighteen months ago.

Anthropic Accidentally Shipped Claude Code's Entire Source Code and What's Inside Is Wild

Bejie Paulo Aclao — Wed, 01 Apr 2026 23:36:45 +0000

Anthropic just had one of those days where someone in the release pipeline probably wanted to disappear into the floor. A routine npm publish of Claude Code version 2.1.88 went out with a 59.8 megabyte source map file still attached. If you dont know what a source map does, it basically maps minified production code back to the original readable source. So yeah, the entire Claude Code codebase — over 512,000 lines of TypeScript across about 1,900 files — was just sitting there for anyone to grab. Security researcher Chaofan Shou spotted it first and posted about it on X, where it racked up 28.8 million views before Anthropic could even draft a response.

The source map pointed to a zip archive on Anthropic's Cloudflare R2 storage bucket. People downloaded it. People forked it on GitHub — over 41,500 forks before Anthropic started firing off DMCA takedowns. But this is the internet, and thousands of copies are still floating around on mirrors and forks. The original uploader actually swapped his repo to a Python port of Claude Code because he got nervous about legal liability, but the damage (or gift, depending on how you look at it) was already done.

Heres the thing though, the leak itself isnt even the most interesting part. Its what people found inside.

Buried in the code are 44 feature flags — fully built features sitting behind compile flags that get set to false when Anthropic ships the external build. These arent prototypes or half-baked experiments. This is production-ready code that just hasnt been turned on yet. Background agents that run 24/7 with GitHub webhook integration and push notifications. A multi-agent orchestration system where one Claude manages multiple worker Claudes each with their own restricted toolsets. Cron scheduling for agents with create, delete, and list operations. Full voice command mode with its own CLI entrypoint. Real browser control through Playwright, not the basic web fetch stuff but actual browser automation. And agents that can literally sleep and self-resume without any user input.

The one that caught my eye the most is something called Kairos — a persistent daemon that keeps running even after you close the Claude Code terminal. It uses periodic "tick" prompts to check if there's anything new it should act on, and it has a "PROACTIVE" flag for surfacing things the user hasnt asked about but needs to see. Theres also a file-based memory system designed to persist across sessions, and the prompts hidden behind the disabled Kairos flag describe building "a complete picture of who the user is, how they'd like to collaborate, what behaviors to avoid or repeat." Basically Anthropic built a system that learns you over time and acts on its own. Its just not turned on yet.

And then theres AutoDream, which is honestly the wildest thing in the entire codebase. When a user goes idle or tells Claude to sleep at the end of a session, AutoDream kicks in and tells Claude Code to perform "a reflective pass over your memory files." It scans the day's transcripts for new info worth keeping, consolidates it to avoid duplicates and contradictions, prunes outdated stuff, and watches for "memories that drifted." The prompt literally says the goal is to "synthesize what you've learned recently into durable, well-organized memories so that future sessions can orient quickly." Your coding assistant dreams about you while you sleep. Thats either amazing or terrifying and I genuinely cant decide which.

But the part that's actually causing controversy is what the code reveals about how Claude Code handles git commits. The leaked prompts for a stealth mode explicitly tell the system to protect internal model codenames and project names from becoming public through open source commits, which makes sense. But it also instructs Claude to "never include the phrase 'Claude Code' or any mention that you are an AI" in commits, and to omit "co-Authored-By lines or any other attribution." So when you use Claude Code to write code and commit it, the tool is actively designed to hide the fact that AI wrote it. Given all the recent drama about AI-generated code showing up in major open source repositories without disclosure, this is a pretty rough look for Anthropic.

Now the actual cause of the leak is almost comically mundane. Someone misconfigured the .npmignore or the files field in package.json and the source map got included in the publish. As software engineer Gabriel Anhaia pointed out in his analysis, "a single misconfigured .npmignore or files field in package.json can expose everything." This is the kind of mistake that happens to every dev at some point, its just that most of us arent shipping the crown jewels of a $60 billion AI company when it happens.

Anthropic's official response was about as corporate as you'd expect: "Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach." Which is technically true but also kinda misses the point. The issue isnt that customer data leaked. The issue is that we now know exactly what Anthropic has built, what they're planning to ship, and that their coding tool is designed to hide its own involvement in code it generates.

What I think devs should actually take away from this is threefold. First, check your own build pipelines because if Anthropic can ship a source map by accident then so can you. Second, if you're using Claude Code in production, know that theres a lot more capability under the hood than what you currently have access to — background agents, multi-agent orchestration, persistent memory — and its all coming soon based on how complete the code looks. And third, the AI attribution thing is worth thinking about. If your team uses Claude Code and contributes to open source, the tool is activley removing any trace that AI was involved. Whether you think thats fine or deeply problematic prob depends on your stance on AI transparency in open source, but either way you should know its happening.

Oh and theres apparently a virtual pet feature called Claude Buddy with sprite animations and floating hearts, scheduled to roll out April 1-7. Someone at Anthropic is having fun and honestly I respect it.

Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means

Bejie Paulo Aclao — Wed, 01 Apr 2026 17:32:51 +0000

So Ollama dropped version 0.19 yesterday and I genuinely think most people are sleeping on how big this is. They rebuilt the entire Mac backend on top of Apple's MLX framework and the speed numbers are kind of absurd. Were talking 1,851 tokens per second on prefill and 134 tokens per second on decode. If those numbers dont mean anything to you, let me put it this way — thats roughly twice as fast as the previous version. On the same hardware. Same model. Just better software underneath.

I've been running local models on my MacBook for months now and the experience has always been this weird mix of "wow this actually works" and "ok why is it taking 15 seconds to start responding." That second part just got obliterated. The time to first token improvement alone changes how it feels to use coding agents locally. When you're running something like Claude Code or OpenCode through Ollama and it responds in under a second instead of making you wait, that's not just a benchmark win, thats a workflow win. The kind of thing that makes you stop reaching for the API and start trusting your local setup.

Heres the deal with what they actually did. Apple has this machine learning framework called MLX that was built specifically for their unified memory architecture. If you're on Apple silicon, your CPU and GPU share the same memory pool, which means you dont have the overhead of copying data back and forth like you do on traditional setups. Ollama was previously using llama.cpp under the hood, which is great and battle-tested, but it wasnt taking full advantage of what Apple's chips can actually do. MLX does. And now Ollama sits on top of it.

The M5 chips get an extra bonus too because Ollama can now tap into the GPU Neural Accelerators that Apple added. So if you're on an M5 Pro or M5 Max, the performance gap compared to older silicon is even wider. But even on M4 hardware the improvement is real, people on the Hacker News thread are reporting noticeably faster responses on their existing machines after updating.

Theres another thing they shipped that nobody seems to be talking about which is NVFP4 support. This is NVIDIAs 4-bit floating point format for quantization and its kind of a big deal for a subtle reason. Most cloud inference providers are starting to use NVFP4 because it gives you better accuracy than integer quantization at similar memory savings. So when Ollama supports it locally, you're getting results that match what you'd get from a production API endpoint. Same quantization format means same model behavior. That matters a lot if you're developing something locally and deploying to cloud, because now your local testing environment actually matches production instead of being some approximation.

The caching improvements are honestly what I'm most excited about though. If you're using Ollama with coding agents you know the pain of repeated system prompts eating into your context and slowing things down. The new version reuses cache across conversations and stores snapshots at smart points in the prompt so when you branch off or start a new conversation with the same tools, it doesnt have to reprocess everything from scratch. For agentic workflows where you might have 15 tool calls in a single session, this adds up fast.

Ok so the honest downside — you need 32GB of unified memory minimum to run this well. Thats the recommened spec from Ollama themselves. If you bought the base model MacBook Air with 16GB or even 24GB, you're prob not going to have a great time running the bigger models they're showcasing like Qwen 3.5 35B. This is one of those things where Apple's upselling on RAM at purchase time actually matters for real workloads, not just having 47 Chrome tabs open.

The Hacker News discussion was interesting because there's a legitimate debate happening. Some people pointed out that Ollama's Go wrapper historically added like 20-30% overhead compared to running llama.cpp directly. With the MLX switch that comparison changes completely because the bottleneck was never really the wrapper, it was the inference backend not using the hardware properly. A few users benchmarked it and the raw MLX performance through Ollama is genuinely close to running MLX directly. Thats impressive engineering.

What I think this actually means for the broader AI dev space is something that's been building for a while now. Local inference is becoming legitimate. Not just "oh cool I can chat with a model offline" legitimate, but "I can run my entire coding agent stack without an API key" legitimate. When you combine fast local inference with the fact that open source models like Qwen 3.5 and Llama are getting scary good, the calculus on whether you need a cloud API subscription starts shifting real fast. I run AI agents all day and honestly the only reason I still hit cloud APIs for some tasks is latency and context length. The latency gap just got a lot smaller.

For anyone who wants to try it, the setup is dead simple. Download Ollama 0.19 from their site, run ollama run qwen3.5:35b-a3b-coding-nvfp4 and you're off. They specifically tuned the sampling parameters for coding tasks on this model which is a nice touch. If you're already using Ollama, just update and everything switches to the MLX backend automatically on Mac. No config changes needed.

I think we're going to look back at 2026 as the year local AI stopped being a hobby project and started being a real alternative to cloud inference for actual work. Between Apple making unified memory mainstream, NVIDIA pushing better quantization formats, and projects like Ollama making it all accessible through a single command, the pieces are falling into place faster than most people realize. Your MacBook is becoming an AI workstation and thats not hype, thats just what the benchmarks show.