Forem: Void

So I Wrote a GTC Preview Post. Then Jensen Did His Thing. Here's What I Didn't Expect

Void — Tue, 17 Mar 2026 04:15:54 +0000

A couple days ago I wrote a post Nvidia GTC Starts Monday... about what I was watching for at Nvidia GTC as a dev. If you read it, cool. If not, the short version: I had five things I cared about Groq integration, NemoClaw, Vera Rubin chips, open models, and a speculative ARM CPU thing.

Jensen just finished the keynote. Two hours. He lifted server racks on stage, brought a walking Olaf robot from Frozen out, and closed the whole show with AI-generated robots singing country music around a campfire.

I need to talk about what actually happened.

The Groq thing landed harder than I expected

In my preview post I wrote that the main question was how Groq's tech would fit into Nvidia's ecosystem. Would it be a side feature? A rebrand? Something bolted on?

Turns out it's more like built in instead of bolted on than I imagined. They announced the Groq 3 LPU, shipping Q3 this year, sitting inside the Vera Rubin platform as a dedicated token accelerator alongside the GPUs. Jensen threw out a number: 35x more throughput per megawatt when you pair them together.

I don't know about you, but when I think about whether my side projects can afford real inference costs, a 35x efficiency jump changes that conversation entirely. I was hoping for "significantly cheaper." This is more like a different cost universe. Whether Nvidia actually passes those savings on to developers or just enjoys fatter margins is a different question but the hardware capability is real.

NemoClaw isn't what I thought it was (and that's fine)

I assumed NemoClaw would be another agent framework competing. Nope. It's an enterprise security layer for OpenClaw the agentic AI platform that's been blowing up lately.

If you are living under a rock or a in a cave you may not have heard of OpenClaw: it was built by an Austrian dev named Peter Steinberger, went massively viral, and the creator joined OpenAI last month. It lets you build AI agents that actually do things communicate via Telegram, execute tasks, operate on their own.

Jensen called it one of the most important open-source developments in 30 years. He compared it to an operating system. NemoClaw is Nvidia's way of making it safe for companies one command install, secure models, protections against agents leaking sensitive data.

Honestly, this is smarter than building a competing framework. Nvidia doesn't want to own the agent layer. They want to be the platform everything runs on. That's been their playbook since CUDA and it keeps working.

He also dropped a new acronym AGaaS. Agents as a Service. Said every company needs an agentic system strategy and that IT is shifting from SaaS to AGaaS. I'm filing that under "probably right but definitely too early to put on a slide deck at work without getting laughed at."

Vera Rubin is not what I was picturing

I was thinking about this as a next-gen GPU. Jensen presented it as a next-gen everything. Seven chips, five rack-scale systems, one supercomputer. A new CPU designed from scratch for agent workloads. A reinvented storage architecture. The Groq LPU plugged in as a token accelerator. The whole thing purpose-built as one vertically integrated system.

10x more performance per watt compared to the previous generation. Samples already shipping. Broader availability second half of this year.

The thing that stuck with me: Jensen specifically described how agents "pound on memory really hard" and need completely different storage systems than traditional workloads. He's not designing hardware for chatbots. He's designing hardware for autonomous systems that run for days, hammering databases and calling tools continuously. The entire platform is architected around that use case.

Also he announced Vera Rubin Ultra connecting 144 GPUs, and a Space-1 module for putting data centers in orbit. I wrote in my preview post that I might "look silly for bringing up" speculative stuff. I did not, however, anticipate needing to have an opinion about space data centers. I do not have one. Moving on.

The open models play was exactly what I expected

I wrote that Nvidia would position itself as the open ecosystem's best friend, especially with Meta's Avocado situation making the future of Llama uncertain. That's exactly what happened.

Jensen announced the Nemotron Coalition Perplexity, Mistral, Black Forest Labs, Cursor, and others working together on open frontier models. He said Nemotron 3 Ultra will be "the best base model in the world."

He didn't mention Meta once. Didn't need to. The whole announcement was "open models matter and here's who's making sure they keep existing." Nvidia sells more GPUs when more people train and run open models. Their incentives are perfectly aligned with this ecosystem. If you're worried about Meta pulling back on open-source AI (and you should be paying attention to that), the Nemotron Coalition is meaningful.

There's also an open models panel on Wednesday with the LangChain, Cursor, and A16Z folks. That might be more interesting for developers than today's keynote was.

The stuff I got wrong

I said gaming wouldn't show up at GTC. Then Jensen announced DLSS 5 what Nvidia is calling the biggest graphics breakthrough since the original DLSS in 2018. It combines traditional rendering with generative AI to produce photorealistic lighting and materials in real-time. Shipping this fall.

I'm not a game dev so I can't evaluate the claims, but the demo looked genuinely impressive and the underlying tech has implications beyond gaming for anyone working with real-time visualization.

I also speculated about ARM CPUs for consumer laptops and desktops. That didn't happen. The Vera CPU is ARM-based but purely for data centers. No consumer chip reveal. Speculation is speculation, and sometimes it's just wrong.

The one thing I keep coming back to

Jensen said he expects purchase orders for Blackwell and Vera Rubin to hit $1 trillion through 2027. Last year that number was $500 billion. It doubled in a year.

AI is now 60% of Nvidia's revenue. This isn't a company that does AI on the side. This is an AI company that also does other things.

Here's what I wrote in my preview: "The hardware dictates the economics. And the economics dictate what we can build." After today, the economics are shifting. Inference is getting radically cheaper. Agent workloads are getting purpose-built hardware. Open models are getting institutional backing.

Whether that translates into better tools for people like us or just bigger numbers on Nvidia's earnings call that's something we have to wait and watch.

For now, I need to process the fact that the keynote ended with robots singing country music about tokens and open-source software. That happened. In real life. In 2026. At a professional conference.

We live in interesting times.

This is a follow up to my previous preview post from Saturday.
Find me at @uncaughtex on X I catch the exceptions nobody handles.

Nvidia GTC Starts Monday. Here's What Actually Matters If You Write Code for a Living

Void — Sat, 14 Mar 2026 16:53:16 +0000

Nvidia's GTC conference starts Monday. Jensen Huang takes the stage at 11 AM Pacific, does his leather jacket thing for two hours, and by Tuesday morning your Twitter/X, LinkedIn feeds will be drowning in hot takes from people who weren't there and didn't watch it.

I figured I'd do something different but along the lines. Instead of waiting for the recap, I wanted to write down what I'm personally watching for as someone who doesn't trade $NVDA stock (Can't even if i wanted to) but does actually use GPUs and AI tools to get work done.

Because here's the thing. GTC has become the Super Bowl of AI infrastructure. 30,000 attendees from 190 countries. It runs March 16-19 across ten venues in downtown San Jose. There are over 1,000 sessions. Jensen will talk about chips, software, models, robots, and probably world peace by the end of it(current situation seems like it).

But if you're a regular developer like anyother someone building apps, running inference, maybe self-hosting models, or just trying to understand where all this is going 90% of that is noise. Here's what i consider my priorities.

The inference chip thing

This is probably the biggest deal for anyone actually deploying AI in prod or home or anywhere.

Training a model is expensive. Running a trained model (inference) is what you do millions of times a day. Right now Nvidia dominates training with something like 80% market share. But inference is a different game. Google, Amazon, and others are building custom chips specifically for inference, trying to eat into Nvidia's lead.

Nvidia reportedly bought Groq last year for $20 billion. If you've used Groq's API, you know the speed is insane. Hundreds of tokens per second. They use a completely different chip architecture that's built specifically for inference rather than the general purpose GPU approach.

What I want to know: how does Groq's tech integrate with Nvidia's ecosystem? If Nvidia can offer both training AND inference at stupid fast speeds, that changes the math for anyone choosing between self-hosting and API calls. The cost of running your own models could drop significantly. Or Nvidia could just jack up prices because they own both sides. We'll see.

NemoClaw - an open-source AI agent framework

There's a rumored announcement of something called NemoClaw. It's apparently an open-source platform for building enterprise AI agents.

Now, "AI agent framework" is one of those terms that makes me immediately suspicious because everyone and their dog has one now and the list is long and growing.

But Nvidia releasing one is different for a specific reason: hardware integration. Most agent frameworks are model-agnostic, which is great for flexibility but means they can't really optimize for the hardware they're running on. An Nvidia-built framework could be tightly coupled with their GPUs, CUDA ecosystem, and TensorRT optimizations in ways that third-party tools can't easily match.

If you're building anything where AI agents need to run fast and locally think ondevice assistants, enterprise tools that can't ship data to the cloud, or anything latency-sensitive this is worth paying attention to.

The Rubin architecture

Nvidia's next gen GPU architecture is called Rubin. Reportedly packing up to 288GB of HBM4 memory with a massive performance leap over the current Blackwell generation. Numbers like "five times the dense floating-point performance" are being thrown around.

I'm not going to pretend I fully understand the differences between HBM3e and HBM4 at the physics level. I don't. What I do understand is what more memory means in practice, bigger models can fit on fewer GPUs. If you're self hosting a 70B parameter model right now, you probably need multiple GPUs. With Rubin's memory capacity, that might change. And that directly affects whether it makes sense to self-host or keep paying per token API fees.

The practical question: when do these actually ship, and at what price? If Rubin is a 2027 product, it's interesting but academic. If it starts showing up in cloud instances by late 2026, that changes planning for anyone running AI workloads.

The open models panel

On Wednesday, Jensen is personally moderating a panel about open frontier models. The guest list includes Harrison Chase (LangChain), plus leaders from A16Z, AI2, Cursor, and Thinking Machines Lab.

This is interesting timing. The Meta Avocado situation just happened (I'll probably write about that after getting some of my facts straight STAY TUNED!!!) their new model got delayed and there are real questions about whether Meta will keep releasing competitive open weight models or shift toward closed source. If there was ever a moment to have a serious conversation about who's going to carry the open model torch, it's right now.

I don't expect Jensen to badmouth Meta directly. But I would not be surprised if Nvidia positions itself as the open ecosystem's best friend. They sell more GPUs when more people are training and running open models. Their incentives are aligned with keeping models free and accessible.

The ARM CPU (rumor)

This one's more speculative, but there are rumors that Nvidia might show ARM-based processors for PCs. They've been doing ARM chips for data centers (Grace), and the question has always been when does that come to laptops and desktops?

Apple proved with M-series chips that ARM can absolutely compete with x86 for developer workloads. If Nvidia enters that space with integrated GPU capabilities, it could be a big deal for developers who want to run local AI models on a laptop without carrying an external GPU.

Or it could be a data center only announcement and we'll never know only time will tell. Such is life.

What I'm NOT watching for

Gaming GPUs. GTC has historically been the enterprise/AI event, not the consumer one. If you're hoping for RTX 5090 Ti pricing, this probably isn't the place.

Stock predictions. I genuinely don't care and I'm not qualified. If you want that, go read Many other Financial Bros blogs.

"AI will change everything" platitudes. Jensen will say inspiring things about AI being essential infrastructure. He says this every year. It's always partly right and partly marketing. I'm filtering for the specific product announcements, not the philosophy.

How to actually watch

The keynote streams free at nvidia.com on Monday, March 16 at 11 AM Pacific/ 2 PM Eastern/ 11PM Indian. No registration needed for the keynote stream. The full conference runs through the 19th, and there's a pre-show starting at 8 AM with analysts and founders.

The pre-show guests include CEOs from Perplexity, LangChain, Mistral AI, and a bunch of AI infrastructure companies. Honestly, the pre-show panel might be more interesting for developers than the keynote itself, since keynotes tend to lean heavy on enterprise partnerships and CEO-to-CEO handshakes.

I'll probably watch the keynote, skim the pre-show, and then cherry-pick sessions over the rest of the week based on what actually gets announced. If anything wild drops, I might write a follow-up.

Why a Software engineer cares about a hardware conference

Because the hardware dictates the economics. And the economics dictate what we can build.

Two years ago, running a 7B model locally was a novelty. Now it's normal. That happened because hardware got cheaper and more accessible. The announcements at GTC will determine whether self-hosting a 70B+ model becomes normal too and how fast.

If inference gets 5x cheaper because of Groq integration, that changes which projects are viable. If Rubin chips make local inference on bigger models practical, that shifts the build-vs-buy calculation. If NemoClaw gives us an agent framework that actually runs well on commodity hardware, that unlocks use cases that are currently too expensive or too slow.

None of that is abstract to me. It's the difference between a side project being a toy and a side project being a product. And that's why I'm watching.

See you Monday.