<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Bejie Paulo Aclao</title>
    <description>The latest articles on Forem by Bejie Paulo Aclao (@serkingiii).</description>
    <link>https://forem.com/serkingiii</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3832980%2Fd1fce934-3be6-442f-af6b-fd0dc25dba96.jpg</url>
      <title>Forem: Bejie Paulo Aclao</title>
      <link>https://forem.com/serkingiii</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/serkingiii"/>
    <language>en</language>
    <item>
      <title>Cursor Just Launched Cursor 3 and I Think They Know Theyre in Trouble</title>
      <dc:creator>Bejie Paulo Aclao</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:56:45 +0000</pubDate>
      <link>https://forem.com/serkingiii/cursor-just-launched-cursor-3-and-i-think-they-know-theyre-in-trouble-2imc</link>
      <guid>https://forem.com/serkingiii/cursor-just-launched-cursor-3-and-i-think-they-know-theyre-in-trouble-2imc</guid>
      <description>&lt;p&gt;Ok so Cursor dropped Cursor 3 yesterday and I've been thinking about it all day because this launch tells you everything about where AI coding tools are heading — and honestly, it doesnt look great for Cursor specifically even though the product itself is kind of impressive.&lt;/p&gt;

&lt;p&gt;Heres what happened. Cursor killed the code editor. Not literally, its still there, but the default view in Cursor 3 is now an agent orchestration panel. No file explorer front and center. No code-first layout. You type what you want in natural language, hit enter, and AI agents go build it. You can spin up multiple agents at once, watch them work in a sidebar, and review what they did. The whole thing was built under the internal codename "Glass" and its basically Cursor admitting that the product that made them famous — the AI-powered code editor — isnt the future anymore. Jonas Nelle, one of their heads of engineering, told WIRED straight up: "A lot of the product that got Cursor here is not as important going forward anymore." Thats a wild thing to say about a product thats generating $2 billion in annualized revenue.&lt;/p&gt;

&lt;p&gt;The reason they did this is obvious if youve been paying attention. Claude Code owns 54% of the AI coding market now according to Menlo Ventures data. Claude Code and OpenAI's Codex both let you spin up agents that work for hours without supervision, and theyre both offered through $200/month subscriptions that give you way more than $200 worth of compute. WIRED reported that Claude Code and Codex users regularly get over $1,000 worth of usage on those plans. Anthropic and OpenAI can afford to burn cash on customer acquisition because theyve raised hundreds of billions between them. Cursor raised $3 billion total, which sounds like a lot until you realize Anthropic alone is valued at $380 billion.&lt;/p&gt;

&lt;p&gt;And developers are noticing. Multiple people told WIRED theyve shifted most of their coding work to Claude Code and away from Cursor. One founder said his decision basically comes down to whoever has the most generous rate limits. Another said he rarely touches Cursor anymore despite using it heavily last year. The thing that made Cursor special — being the best IDE with AI built in — stopped mattering when the AI got good enough to just build the whole thing without an IDE.&lt;/p&gt;

&lt;p&gt;But heres where it gets messy. Cursor launched Composer 2 to power all this, and they claimed it matches GPT-5.4 on coding benchmarks at one-tenth the inference cost. Sounds amazing right? Except TechCrunch reported that Composer 2 is actually built on top of Moonshot AI's open-source Kimi 2.5 model, and Cursor didnt disclose that until users pushed them on it. Thats not illegal or anything — Kimi 2.5 is open source — but not saying "hey this is fine-tuned Kimi" upfront when youre marketing it as your own model is the kind of thing that makes developers trust you less. And trust is basically all a dev tools company has.&lt;/p&gt;

&lt;p&gt;The pricing situation is also kind of insane. One early reviewer burned through roughly $2,000 in two days of normal use with Cursor 3. Two thousand dollars. In two days. Meanwhile that same workload runs at a flat $200/month on Claude Code with unlimited Opus access. Cursor tried usage-based pricing back in mid-2025 and developers hated it so much the company had to apologize. Now theyre doing it again but with higher stakes because the alternative products are better and cheaper.&lt;/p&gt;

&lt;p&gt;I use Claude Code for most of my stuff tbh and the reason is exactly what everyone else is saying — the value per dollar is absurd right now. I can spin up background agents that work on different parts of a project simultaneously while I review what the previous batch produced. Cursor was amazing when AI coding meant autocomplete and inline suggestions. But we're in what Cursor's own CEO Michael Truell calls the "third era" — first was autocomplete through 2025, second was synchronous copilots where you guided the AI, and now its autonomous agents that work independently for hours. The problem for Cursor is that the companies who make the actual AI models are naturally better positioned for that third era than a company that wraps those models in a nice interface.&lt;/p&gt;

&lt;p&gt;That said, Cursor isnt dead and anyone saying that is prob being dramatic. They have $2 billion in revenue, 67% of the Fortune 500 as customers, and theyre generating 150 million lines of enterprise code per day. Their internal engineering team already has 35% of pull requests generated by autonomous agents running on cloud VMs — each agent gets a full dev environment, tests its output by navigating the UI like a human, and returns merge-ready code with video demos attached. That's genuinely cool and it shows the product works when it works.&lt;/p&gt;

&lt;p&gt;The real question is whether a $29 billion coding startup can survive when Anthropic and OpenAI are willing to subsidize their competing products indefinitely. Cursor's head of engineering recently left. Fortune reported that several startups in one investor's portfolio are activley moving off the platform. The company is trying to raise at a $50 billion valuation right now, which either means they're confident or desperate, and in this market its honestly hard to tell which.&lt;/p&gt;

&lt;p&gt;My take: Cursor 3 is a good product launched from a position of weakness. They built exactly what they needed to build — an agent-first interface that competes directly with Claude Code and Codex — but theyre doing it 6 months late with a model wrapped around someone else's open source project and pricing that makes developers do math before pressing enter. The AI coding war is real, its happening right now, and the companies with the deepest pockets and the best models are winning. Cursor's still in the fight but the clock is ticking and everyone including them knows it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story</title>
      <dc:creator>Bejie Paulo Aclao</dc:creator>
      <pubDate>Fri, 03 Apr 2026 08:57:15 +0000</pubDate>
      <link>https://forem.com/serkingiii/90000-tech-workers-got-fired-this-year-and-everyone-is-blaming-ai-but-thats-not-the-whole-story-2ie8</link>
      <guid>https://forem.com/serkingiii/90000-tech-workers-got-fired-this-year-and-everyone-is-blaming-ai-but-thats-not-the-whole-story-2ie8</guid>
      <description>&lt;p&gt;I build AI agents. Like, thats literally what I do all day — I wire up autonomous systems that scout the internet, write content, publish articles, and report back to me without me touching anything. So when I see headlines screaming about 90,000 tech workers getting fired because of AI, I have a very specific reaction, which is: some of these companies are telling the truth, and a lot of them are completely full of it.&lt;/p&gt;

&lt;p&gt;The numbers are real though. Challenger, Gray &amp;amp; Christmas released a report this week showing 52,050 tech layoffs in Q1 2026 alone — thats a 40% jump from the same period last year. In March alone, AI was cited as the reason for 15,341 of those firings, which is 25% of all tech job cuts that month. A month earlier that number was 10%. So the trend is accelerating fast. TrueUp's tracker puts the running total even higher at around 90,000 tech workers impacted across 212 companies since January. And then Oracle dropped a bomb this week — somewhere between 20,000 and 30,000 employees got a 6 AM email telling them they were done. Their corporate Slack went from 165,000 users to 155,000 in a single day. The freed-up cash is reportedly going straight to AI data center investments where Oracle has a $20 billion funding gap.&lt;/p&gt;

&lt;p&gt;But heres where it gets interesting. Marc Andreessen went on the 20VC podcast this week and basically called the whole thing "AI washing." His argument is that every large company overhired during the pandemic by at least 25%, some by as much as 75%, and now they're using AI as a convenient excuse to do the layoffs they should have done two years ago. And honestly? I think he's at least partially right. Salesforce CEO Marc Benioff said something similar — that companies are cutting workers for several different reasons and lumping them all together under the AI label because it makes the company look like theyre on the cutting edge rather than just cleaning up bad hiring decisions.&lt;/p&gt;

&lt;p&gt;Think about it from a CEO's perspective for a second. If you fire 10,000 people and say "we overhired during COVID and our margins are terrible," your stock tanks and everyone calls you an idiot. But if you fire 10,000 people and say "AI is replacing these roles as part of our strategic pivot to artificial intelligence," suddenly youre a visionary. The narrative completely changes. Wall Street loves it. The board loves it. Your remaining employees are scared enough to work harder. Its the perfect corporate magic trick.&lt;/p&gt;

&lt;p&gt;But I'm not going to pretend AI isnt actually replacing people too because it def is. Block laid off 4,000 people — 40% of their entire workforce — and Jack Dorsey was unusually honest about it. He said flat out that "this is not driven by financial difficulty, but by the growing capability of AI tools to perform a wider range of tasks." Amazon cut 16,000 corporate roles this year and heavily implied AI would handle the work. Meta's planning to cut up to 15,000 employees to offset their massive AI investments. Atlassian cut 10% of its workforce specifically because AI changes the skills they need across the business.&lt;/p&gt;

&lt;p&gt;A Duke University survey of 750 CFOs published last month tried to quantify the actual impact and came up with a number that's simultaneously scary and underwhelming: AI could eliminate about 500,000 jobs from the US economy in 2026. Not 500,000 people getting fired, but 500,000 fewer jobs existing than would have existed without AI — through a mix of actual layoffs and companies just not hiring for roles they would have filled. Thats about 42,000 jobs a month, and the researcher who ran the study pointed out thats actually huge when the US is only adding about 10,000 jobs per month right now. So the net effect is real even if individual headlines are exaggerated.&lt;/p&gt;

&lt;p&gt;The part that nobody wants to talk about is who's getting cut and who's getting hired. Anthropic, OpenAI, and xAI are hiring aggresively. Claude paid subscriptions more than doubled in 2026. OpenAI hit $25 billion in annualized revenue. The companies building AI are growing fast. The companies buying AI to replace their workforce are the ones doing the firing. And the jobs getting cut arent actually the senior engineers everyone worries about — the CFO survey found that companies plan to increase skilled technical roles like engineers and data scientists while decreasing routine clerical work like data entry. An Anthropic analysis found that programming, customer service, and data entry are the most exposed categories.&lt;/p&gt;

&lt;p&gt;So whats actually happening is a rebalancing, not an apocalypse. The total number of tech jobs is shrinking, yes. But the jobs that are growing pay more and require different skills than the jobs disappearing. Thats cold comfort if you just got a 6 AM email from "Oracle Leadership" (not even a real persons name, just the company signing it like a robot, which is almost poetic), but its the reality of what the data shows. A Stanford Digital Economy Lab note put it well — there wont be a single moment where everyone realizes AI is eliminating jobs. It'll keep creeping up, and at some point we'll realize in hindsight that it already happened.&lt;/p&gt;

&lt;p&gt;My actual take as someone building these systems: the companies that are being honest about AI replacing roles (Block, Amazon) are prob the ones you want to watch because they're actually integrating AI deeply enough that it changes headcount. The companies using AI as an excuse for bad management (you know who you are) will rehire in 18 months when they realize they cut the wrong people. And for developers — the demand for people who can actually build, deploy, and maintain AI systems has never been higher. The irony of the whole situation is that AI is simultaneously the thing killing jobs and the thing creating the most new ones. The question is just whether youre building the robots or getting replaced by them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Microsoft Just Told OpenAI It Doesnt Need Them Anymore and Dropped Three Models to Prove It</title>
      <dc:creator>Bejie Paulo Aclao</dc:creator>
      <pubDate>Fri, 03 Apr 2026 02:53:23 +0000</pubDate>
      <link>https://forem.com/serkingiii/microsoft-just-told-openai-it-doesnt-need-them-anymore-and-dropped-three-models-to-prove-it-4gki</link>
      <guid>https://forem.com/serkingiii/microsoft-just-told-openai-it-doesnt-need-them-anymore-and-dropped-three-models-to-prove-it-4gki</guid>
      <description>&lt;p&gt;So Microsoft just released three AI models it built entirely in-house — a transcription model, a voice generator, and an image model — and if you're not paying attention to why this matters, you're missing the biggest shift in the AI industry since OpenAI went corporate. Because this isnt just Microsoft launching some models. This is Microsoft publicly declaring that they can build their own frontier AI without OpenAI, after spending $13 billion convincing everyone they couldnt.&lt;/p&gt;

&lt;p&gt;Here's the backstory that makes this wild. Until October 2025, Microsoft was literally not allowed by contract to build AGI or superintelligence on their own. The original deal with OpenAI from 2019 gave Microsoft a license to use OpenAI's models in exchange for building the cloud infrastructure OpenAI needed. Microsoft got the models, OpenAI got the compute, everyone was happy. Except then OpenAI started shopping around for compute deals with SoftBank and others, basically telling Microsoft "we need more than just you." So Microsoft renegotiated. And buried in that renegotiation was the clause that matters: Microsoft is now free to independently pursue superintelligence. Mustafa Suleyman, Microsoft's CEO of AI, told The Verge he'd been planning this move for nine months before the renegotiation even happened. The contract change just made it official.&lt;/p&gt;

&lt;p&gt;And he's not being subtle about it. In an interview with VentureBeat, Suleyman said "back in September of last year, we renegotiated the contract with OpenAI, and that enabled us to independently pursue our own superintelligence. Since then, we've been convening the compute and the team and buying up the data that we need." Thats not partnership language. Thats someone building a competing operation while maintaining just enough diplomatic niceties to keep the lawyers comfortable.&lt;/p&gt;

&lt;p&gt;Now lets talk about what they actually shipped because the models are legitimately impressive. MAI-Transcribe-1 is their speech-to-text model and it beats OpenAI's Whisper-large-v3 on all 25 tested languages. It also beats Google's Gemini 3.1 Flash on 22 of 25 languages. The average word error rate is 3.8% on the FLEURS benchmark which is genuinely best-in-class. And heres the kicker — Suleyman claims it runs at half the GPU cost of competing state-of-the-art models. If thats true and not just marketing, thats a massive deal for anyone running transcription at scale. Theyre already testing it inside Copilot Voice and Microsoft Teams, which means it's probably replacing whatever they were licensing from OpenAI for those products.&lt;/p&gt;

&lt;p&gt;MAI-Voice-1 generates 60 seconds of natural audio in one second and can create a custom voice from just a few seconds of sample audio. Priced at $22 per million characters. MAI-Image-2 hit top three on the Arena.ai leaderboard and is already rolling out across Bing and PowerPoint, with WPP (one of the worlds largest ad companies) building on it at scale. Priced at $5 per million tokens input, $33 per million tokens image output. None of these prices are accidental — Microsoft is clearly trying to undercut OpenAI and Google on cost while matching or beating them on quality.&lt;/p&gt;

&lt;p&gt;The timing of all this is almost too perfect. Microsoft just closed its worst stock quarter since the 2008 financial crisis. Investors are getting nervous about the hundreds of billions being poured into AI infrastructure with not enough revenue to show for it. These models are Suleyman's answer to "where's the return?" And the answer is basically "we'll build it cheaper ourselves." The whole mid-March reorg at Microsoft was designed around this — Suleyman handed off day-to-day Copilot oversight to Jacob Andreou so he could focus entirely on what he calls "humanist superintelligence," which is Microsoft's way of saying "AI that actually makes money because it does useful things for people."&lt;/p&gt;

&lt;p&gt;What I think developers should actually care about here is the platform implications. If Microsoft can build competitive models in-house, the strategic value of the OpenAI partnership drops significantly. Microsoft still has license rights to everything OpenAI builds through 2032, so they're not losing anything. But OpenAI is losing their biggest distribution advantage — the guarantee that Microsoft would always ship OpenAI models because they had no alternative. Now they do.&lt;/p&gt;

&lt;p&gt;For developers using Azure, this means youre probably going to see MAI models popping up as options alongside GPT models in Azure AI Foundry. The transcription model alone could save serious money if youre running any kind of speech processing pipeline. And if Microsoft keeps this pace — Suleyman promised "more models soon" — we might be looking at a world where the best models for specific enterprise tasks come from Microsoft, not OpenAI. Suleyman built the MAI-Transcribe-1 with a team of just 10 people, which is the kind of small-team-with-big-compute story that should make every startup founder both inspired and slightly terrified.&lt;/p&gt;

&lt;p&gt;The partnership language coming from both sides right now is the corporate equivalent of a couple telling friends "we're fine, everything's great" while one of them is already apartment hunting. Microsoft says nothing is changing. Suleyman says they'll be partners until at least 2032. But actions speak louder than press releases, and Microsoft just shipped three competing models, reorganized their entire AI division around building more of them, and their CEO of AI is using words like "self-sufficiency" and "independently pursue superintellegence" in every interview. If I had to bet on where this relationship is in two years, I'd say Microsoft keeps the license deal because its free real estate, but functionally they'll be running their own model stack for everything that matters. OpenAI becomes a backup plan disguised as a partership.&lt;/p&gt;

&lt;p&gt;The real question is whether this is good or bad for developers. Honestly? I think its great. More competition on cost and quality means cheaper inference for everyone. Microsoft has the distribution (Azure, Office, Windows, Teams, VS Code) and now they're building the models to match. OpenAI has to stay sharp or risk becoming the AI equivalent of that friend who peaked in college. And for those of us building on these APIs, having two or three genuinely competitive options at different price points is way better than the OpenAI monopoly we had eighteen months ago.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Anthropic Accidentally Shipped Claude Code's Entire Source Code and What's Inside Is Wild</title>
      <dc:creator>Bejie Paulo Aclao</dc:creator>
      <pubDate>Wed, 01 Apr 2026 23:36:45 +0000</pubDate>
      <link>https://forem.com/serkingiii/anthropic-accidentally-shipped-claude-codes-entire-source-code-and-whats-inside-is-wild-1en7</link>
      <guid>https://forem.com/serkingiii/anthropic-accidentally-shipped-claude-codes-entire-source-code-and-whats-inside-is-wild-1en7</guid>
      <description>&lt;p&gt;Anthropic just had one of those days where someone in the release pipeline probably wanted to disappear into the floor. A routine npm publish of Claude Code version 2.1.88 went out with a 59.8 megabyte source map file still attached. If you dont know what a source map does, it basically maps minified production code back to the original readable source. So yeah, the entire Claude Code codebase — over 512,000 lines of TypeScript across about 1,900 files — was just sitting there for anyone to grab. Security researcher Chaofan Shou spotted it first and posted about it on X, where it racked up 28.8 million views before Anthropic could even draft a response.&lt;/p&gt;

&lt;p&gt;The source map pointed to a zip archive on Anthropic's Cloudflare R2 storage bucket. People downloaded it. People forked it on GitHub — over 41,500 forks before Anthropic started firing off DMCA takedowns. But this is the internet, and thousands of copies are still floating around on mirrors and forks. The original uploader actually swapped his repo to a Python port of Claude Code because he got nervous about legal liability, but the damage (or gift, depending on how you look at it) was already done.&lt;/p&gt;

&lt;p&gt;Heres the thing though, the leak itself isnt even the most interesting part. Its what people found inside.&lt;/p&gt;

&lt;p&gt;Buried in the code are 44 feature flags — fully built features sitting behind compile flags that get set to false when Anthropic ships the external build. These arent prototypes or half-baked experiments. This is production-ready code that just hasnt been turned on yet. Background agents that run 24/7 with GitHub webhook integration and push notifications. A multi-agent orchestration system where one Claude manages multiple worker Claudes each with their own restricted toolsets. Cron scheduling for agents with create, delete, and list operations. Full voice command mode with its own CLI entrypoint. Real browser control through Playwright, not the basic web fetch stuff but actual browser automation. And agents that can literally sleep and self-resume without any user input.&lt;/p&gt;

&lt;p&gt;The one that caught my eye the most is something called Kairos — a persistent daemon that keeps running even after you close the Claude Code terminal. It uses periodic "tick" prompts to check if there's anything new it should act on, and it has a "PROACTIVE" flag for surfacing things the user hasnt asked about but needs to see. Theres also a file-based memory system designed to persist across sessions, and the prompts hidden behind the disabled Kairos flag describe building "a complete picture of who the user is, how they'd like to collaborate, what behaviors to avoid or repeat." Basically Anthropic built a system that learns you over time and acts on its own. Its just not turned on yet.&lt;/p&gt;

&lt;p&gt;And then theres AutoDream, which is honestly the wildest thing in the entire codebase. When a user goes idle or tells Claude to sleep at the end of a session, AutoDream kicks in and tells Claude Code to perform "a reflective pass over your memory files." It scans the day's transcripts for new info worth keeping, consolidates it to avoid duplicates and contradictions, prunes outdated stuff, and watches for "memories that drifted." The prompt literally says the goal is to "synthesize what you've learned recently into durable, well-organized memories so that future sessions can orient quickly." Your coding assistant dreams about you while you sleep. Thats either amazing or terrifying and I genuinely cant decide which.&lt;/p&gt;

&lt;p&gt;But the part that's actually causing controversy is what the code reveals about how Claude Code handles git commits. The leaked prompts for a stealth mode explicitly tell the system to protect internal model codenames and project names from becoming public through open source commits, which makes sense. But it also instructs Claude to "never include the phrase 'Claude Code' or any mention that you are an AI" in commits, and to omit "co-Authored-By lines or any other attribution." So when you use Claude Code to write code and commit it, the tool is actively designed to hide the fact that AI wrote it. Given all the recent drama about AI-generated code showing up in major open source repositories without disclosure, this is a pretty rough look for Anthropic.&lt;/p&gt;

&lt;p&gt;Now the actual cause of the leak is almost comically mundane. Someone misconfigured the .npmignore or the files field in package.json and the source map got included in the publish. As software engineer Gabriel Anhaia pointed out in his analysis, "a single misconfigured .npmignore or files field in package.json can expose everything." This is the kind of mistake that happens to every dev at some point, its just that most of us arent shipping the crown jewels of a $60 billion AI company when it happens.&lt;/p&gt;

&lt;p&gt;Anthropic's official response was about as corporate as you'd expect: "Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach." Which is technically true but also kinda misses the point. The issue isnt that customer data leaked. The issue is that we now know exactly what Anthropic has built, what they're planning to ship, and that their coding tool is designed to hide its own involvement in code it generates.&lt;/p&gt;

&lt;p&gt;What I think devs should actually take away from this is threefold. First, check your own build pipelines because if Anthropic can ship a source map by accident then so can you. Second, if you're using Claude Code in production, know that theres a lot more capability under the hood than what you currently have access to — background agents, multi-agent orchestration, persistent memory — and its all coming soon based on how complete the code looks. And third, the AI attribution thing is worth thinking about. If your team uses Claude Code and contributes to open source, the tool is activley removing any trace that AI was involved. Whether you think thats fine or deeply problematic prob depends on your stance on AI transparency in open source, but either way you should know its happening.&lt;/p&gt;

&lt;p&gt;Oh and theres apparently a virtual pet feature called Claude Buddy with sprite animations and floating hearts, scheduled to roll out April 1-7. Someone at Anthropic is having fun and honestly I respect it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means</title>
      <dc:creator>Bejie Paulo Aclao</dc:creator>
      <pubDate>Wed, 01 Apr 2026 17:32:51 +0000</pubDate>
      <link>https://forem.com/serkingiii/ollama-just-got-stupid-fast-on-mac-and-nobody-is-talking-about-what-this-actually-means-ck4</link>
      <guid>https://forem.com/serkingiii/ollama-just-got-stupid-fast-on-mac-and-nobody-is-talking-about-what-this-actually-means-ck4</guid>
      <description>&lt;p&gt;So Ollama dropped version 0.19 yesterday and I genuinely think most people are sleeping on how big this is. They rebuilt the entire Mac backend on top of Apple's MLX framework and the speed numbers are kind of absurd. Were talking 1,851 tokens per second on prefill and 134 tokens per second on decode. If those numbers dont mean anything to you, let me put it this way — thats roughly twice as fast as the previous version. On the same hardware. Same model. Just better software underneath.&lt;/p&gt;

&lt;p&gt;I've been running local models on my MacBook for months now and the experience has always been this weird mix of "wow this actually works" and "ok why is it taking 15 seconds to start responding." That second part just got obliterated. The time to first token improvement alone changes how it feels to use coding agents locally. When you're running something like Claude Code or OpenCode through Ollama and it responds in under a second instead of making you wait, that's not just a benchmark win, thats a workflow win. The kind of thing that makes you stop reaching for the API and start trusting your local setup.&lt;/p&gt;

&lt;p&gt;Heres the deal with what they actually did. Apple has this machine learning framework called MLX that was built specifically for their unified memory architecture. If you're on Apple silicon, your CPU and GPU share the same memory pool, which means you dont have the overhead of copying data back and forth like you do on traditional setups. Ollama was previously using llama.cpp under the hood, which is great and battle-tested, but it wasnt taking full advantage of what Apple's chips can actually do. MLX does. And now Ollama sits on top of it.&lt;/p&gt;

&lt;p&gt;The M5 chips get an extra bonus too because Ollama can now tap into the GPU Neural Accelerators that Apple added. So if you're on an M5 Pro or M5 Max, the performance gap compared to older silicon is even wider. But even on M4 hardware the improvement is real, people on the Hacker News thread are reporting noticeably faster responses on their existing machines after updating.&lt;/p&gt;

&lt;p&gt;Theres another thing they shipped that nobody seems to be talking about which is NVFP4 support. This is NVIDIAs 4-bit floating point format for quantization and its kind of a big deal for a subtle reason. Most cloud inference providers are starting to use NVFP4 because it gives you better accuracy than integer quantization at similar memory savings. So when Ollama supports it locally, you're getting results that match what you'd get from a production API endpoint. Same quantization format means same model behavior. That matters a lot if you're developing something locally and deploying to cloud, because now your local testing environment actually matches production instead of being some approximation.&lt;/p&gt;

&lt;p&gt;The caching improvements are honestly what I'm most excited about though. If you're using Ollama with coding agents you know the pain of repeated system prompts eating into your context and slowing things down. The new version reuses cache across conversations and stores snapshots at smart points in the prompt so when you branch off or start a new conversation with the same tools, it doesnt have to reprocess everything from scratch. For agentic workflows where you might have 15 tool calls in a single session, this adds up fast.&lt;/p&gt;

&lt;p&gt;Ok so the honest downside — you need 32GB of unified memory minimum to run this well. Thats the recommened spec from Ollama themselves. If you bought the base model MacBook Air with 16GB or even 24GB, you're prob not going to have a great time running the bigger models they're showcasing like Qwen 3.5 35B. This is one of those things where Apple's upselling on RAM at purchase time actually matters for real workloads, not just having 47 Chrome tabs open.&lt;/p&gt;

&lt;p&gt;The Hacker News discussion was interesting because there's a legitimate debate happening. Some people pointed out that Ollama's Go wrapper historically added like 20-30% overhead compared to running llama.cpp directly. With the MLX switch that comparison changes completely because the bottleneck was never really the wrapper, it was the inference backend not using the hardware properly. A few users benchmarked it and the raw MLX performance through Ollama is genuinely close to running MLX directly. Thats impressive engineering.&lt;/p&gt;

&lt;p&gt;What I think this actually means for the broader AI dev space is something that's been building for a while now. Local inference is becoming legitimate. Not just "oh cool I can chat with a model offline" legitimate, but "I can run my entire coding agent stack without an API key" legitimate. When you combine fast local inference with the fact that open source models like Qwen 3.5 and Llama are getting scary good, the calculus on whether you need a cloud API subscription starts shifting real fast. I run AI agents all day and honestly the only reason I still hit cloud APIs for some tasks is latency and context length. The latency gap just got a lot smaller.&lt;/p&gt;

&lt;p&gt;For anyone who wants to try it, the setup is dead simple. Download Ollama 0.19 from their site, run &lt;code&gt;ollama run qwen3.5:35b-a3b-coding-nvfp4&lt;/code&gt; and you're off. They specifically tuned the sampling parameters for coding tasks on this model which is a nice touch. If you're already using Ollama, just update and everything switches to the MLX backend automatically on Mac. No config changes needed.&lt;/p&gt;

&lt;p&gt;I think we're going to look back at 2026 as the year local AI stopped being a hobby project and started being a real alternative to cloud inference for actual work. Between Apple making unified memory mainstream, NVIDIA pushing better quantization formats, and projects like Ollama making it all accessible through a single command, the pieces are falling into place faster than most people realize. Your MacBook is becoming an AI workstation and thats not hype, thats just what the benchmarks show.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
