Forem: Harry Floyd

NVIDIA Q1 FY2027 Earnings Preview — 5 Signals the Market May Be Missing

Harry Floyd — Sun, 17 May 2026 11:17:20 +0000

NVIDIA reports Q1 FY2027 earnings on May 20, 2026, after market close. The consensus expects approximately $78.1-78.8 billion in revenue and $1.74 EPS, with Citi and Wells Fargo running slightly higher at ~$80B and $1.79 respectively.

The stock closed at ~$225 on the most recent trading day, roughly a 27x forward P/E on FY2027 estimates. This is not a distressed entry point. It is a thesis-testing moment.

Below are 5 signals to watch that go beyond the headline beat-or-miss narrative. Each maps to a specific structural claim about NVIDIA's position in the AI infrastructure stack, and each has a falsification trigger that would challenge that claim.

1. Purchase Commitments: Are They Still Rising?

NVIDIA's supply-related purchase commitments rose from $50.3 billion to $95.2 billion between Q3 and Q4 FY2026, nearly doubling in a single quarter. This is not optional inventory building. It is NVIDIA aggressively locking in component supply for constraints it believes are structural.

What to watch: If commitments rise further in Q1, NVIDIA is deepening its supply chain lock-in through at least 2027. If they flatten, either supply is easing (bullish for margins) or suppliers have hit allocation limits (bearish for revenue growth).

Falsification trigger: A flat or declining commitment trajectory would suggest NVIDIA sees peak demand behind it.

2. Optical Interconnect Mentions

On May 6, NVIDIA announced a $500 million partnership with Corning (GLW) — three new US optical factories, 10x capacity increase, and a warrant structure giving NVIDIA up to 15 million shares at $180. This follows $2B+ purchase commitments each to Lumentum (LITE) (which reported $808M revenue, +90% YoY on May 5) and Coherent (COHR) ($1.81B, +21% YoY on May 6).

Combined, over $4.7B+ was committed to the optical supply chain in 10 weeks across three independent layers: passive fiber (Corning), active components (LITE, COHR), and co-packaged optics (Ayar Labs, ~$155M NVIDIA portion).

What to watch: If management references Corning, Lumentum, or Coherent by name on the call, it validates the thesis that optical interconnect is the next binding constraint beyond HBM memory. Silence on optical supply is a missed signal.

Falsification trigger: If optical supply is described as "secured" or "no longer a concern," the bottleneck thesis for the photonics layer weakens meaningfully.

3. Blackwell Margin Trajectory

The Street is fixated on gross margins during the Blackwell ramp. The concern is that the more complex B200/B300 packaging (CoWoS-L) compresses margins compared to the simpler H100/H200 designs.

What to watch: The directional trajectory matters more than the absolute number. Sequential margin expansion indicates the ramp is absorbing complexity costs. Compression suggests the packaging premium is permanent.

The deeper signal: NVIDIA's gross margin has been the most-watched metric for 8 consecutive quarters. The market has already priced in margin compression, so an in-line or better margin print removes a major overhang. A miss amplifies the ASIC-competition narrative.

Context: NVIDIA pre-booked approximately 60% of TSMC's total 2026 CoWoS output (per Morgan Stanley), with demand of ~700,000 wafers. CoWoS is the packaging constraint. If margins hold despite this capacity scramble, the GPU economics thesis is intact.

4. Inference Mix and Agent Workload Commentary

The AI market narrative shifted in 2026 from "training is everything" to "inference is the growth vector." Jensen Huang's GTC keynote emphasized AI factories as long-running inference infrastructure, not just training clusters. Agentic workloads -- autonomous systems that chain multiple model calls per task -- compound inference demand beyond what chat-era projections captured.

What to watch: Any qualitative commentary about inference workload growth, token demand trajectories, or agentic infrastructure spend. If management quantifies inference as a growing share of data center revenue, it supports the thesis that AI compute demand has structural legs beyond model training.

Falsification trigger: If inference growth is described as "migrating to edge devices" or "handled by CPU-based systems," the GPU-inference thesis weakens. If there is silence on this topic entirely, the market may be overestimating inference demand relative to what the company sees.

5. ASIC Competition Framing

The most credible competitive threat to NVIDIA is custom hyperscaler ASICs: Google's TPU 8t/8i (April 22 launch, Anthropic committed to 1M TPU v7 chips), Amazon's Trainium 2, and Meta's MTIA (accelerating with Broadcom through 2029). Industry analysts project custom ASIC shipments growing significantly faster than GPU shipments in 2026 as hyperscalers vertically integrate.

However, every ASIC still needs HBM memory, optical interconnect, and CoWoS packaging -- all of which NVIDIA has pre-booked at scale. ASIC growth at the margin does not necessarily mean NVIDIA loses revenue. It means the total AI compute pie is growing, and NVIDIA captures the GPU slice while participating in the broader ecosystem via supply chain positioning.

What to watch: How management frames ASIC competition. If they dismiss it as irrelevant, that would suggest they are not tracking the custom silicon trend. If they acknowledge it and frame NVIDIA's counter-position (CUDA moat, NVLink, ecosystem), that signals clear-eyed strategy.

Falsification trigger: A hyperscaler announcing that their internal chip has reached >50% utilization across their own AI workload would signal structural erosion. Not expected this quarter.

Putting It Together

The standard earnings framework (beat, miss, guide, P/E) tells you how the market feels about NVIDIA today. The 5 signals above test whether NVIDIA's structural position is improving or eroding.

Signal	What It Tests	Bullish	Bearish
Purchase commitments	Supply chain conviction	Rising	Flat/falling
Optical mentions	Bottleneck migration thesis	Named by management	Not discussed
Blackwell margins	Ramp economics	Expanding	Compressing
Inference mix	Demand durability	Quantified growth	Silent/edge-focused
ASIC framing	Competitive awareness	Acknowledged + countered	Dismissed

The Durability Curve framework rates NVIDIA as a Law I (Bottleneck Migration) and Law II (Difficulty Is Load-Bearing) play. The falsification triggers above test both laws. Through this lens, May 20 is not about whether NVIDIA beats by $1B. It is about whether the structural evidence supports or challenges the durability thesis.

This analysis is derived from the **Durability Curve* research framework, a systematic approach to identifying AI infrastructure bottlenecks before they are priced. The full 36-page NVIDIA Q1 FY2027 earnings research report with detailed falsification triggers, supply chain signal verification across all 5 layers, and options positioning framework is available at:*

📄 NVIDIA Q1 FY2027 Earnings Research Report (36 pages, £9)

Follow @durabilitycurve on Mastodon for real-time signal monitoring during the earnings call. Free weekly analysis at harryfloyd.substack.com.

Not financial advice. All data points verified against public sources as of May 17, 2026. Verify independently before making investment decisions.

A Developer's Guide to AI Inference Costs in 2026

Harry Floyd — Sat, 16 May 2026 21:45:26 +0000

If you're building AI features in 2026, your gross margin depends on a question most developers don't have a good answer to: what does one inference actually cost?

The answer isn't in the model card. It's in the physical infrastructure chain that runs from a fab in Taiwan to a data centre in Virginia. Here's how to estimate it.

The easy part: API pricing

If you're using an API (OpenAI, Anthropic, Together, Groq), your per-token cost is known. The hidden variable is cache-hit rate. Prompt caching drops cost by 2-10x depending on how much of your system prompt is shared across requests. If you haven't measured your cache-hit ratio, you don't know your true cost.

Most teams I've seen get 30-50% cache hits on well-structured prompts and close to 0% on dynamic ones. That's a 2x difference in effective cost hiding in plain sight.

The harder part: self-hosted inference

Running your own models means paying for GPU time whether you're using it or not. The number that matters: utilization rate.

A single H100 at $2-3/hour needs to be generating tokens >60% of the time to beat API pricing at scale. Below 30% utilization, you'd have been better off on an API. Most self-hosted deployments I see run at 15-25% because of traffic spikes and idle standby capacity.

The break-even math:

API: pay per token, zero fixed cost
Self-hosted: ~$2k/month per GPU all-in, first ~2M tokens are effectively "paying off the fixed cost"
Breakeven: ~4-5M tokens/month per GPU, assuming 60% utilization

The hidden constraint: hardware availability

This is the part most infrastructure analyses miss. In 2026, GPU lead times are still 12-18 months for new deployments. H200s are shipping but allocated. The secondary market for A100s is active but prices haven't dropped as much as expected — because demand from inference workloads has replaced training demand.

What this means for your deployment plan: if you need GPUs in a quarter, you're renting. If you're renting, you can't amortize hardware cost. If you can't amortize, you're at the mercy of spot pricing — which has swung 40% in a single month twice this year already.

The one number to track

For any AI feature, track cost per completed interaction — not cost per token. Token counts hide the real metric: how many tokens does your average user interaction consume?

A chatbot using Claude Sonnet 4.6 ($3/M input, $15/M output) averaging 2,000 tokens per conversation with a typical 70/30 input/output split costs roughly $0.013 per conversation. At 100k conversations/month, that's $1,300 — significant enough that a 10% improvement in token efficiency pays for an engineer's time.

Most teams don't know their average cost per interaction. That's the first number worth instrumenting — without it, you can't tell whether optimisation matters or not.

Summary

Measure your cache-hit ratio (30-50% is typical; anything below 20% means expensive redundant computation)
Track cost per completed interaction, not per token
Know your self-host breakeven point (~4M tokens/month per GPU)
Assume 12-month lead times for hardware — plan accordingly
Spot GPU pricing can swing 40% in a month; don't build on spot for production

The infrastructure layer is the part of AI most developers treat as someone else's problem. It isn't. The teams that understand their cost-per-interaction will build features that survive the margin compression that's coming.

The Model Is Not the Moat

Harry Floyd — Fri, 15 May 2026 17:00:32 +0000

I keep hearing the same assumption underneath AI strategy talk: the winner will be whoever has the strongest model. Smarter model wins. Everything else is secondary.

That sounds plausible right up until you look at how people actually choose tools in real life.

If scale laws keep holding, local models probably won't beat the frontier on raw intelligence. But users don't adopt "raw intelligence." They adopt something that fits into their day: something fast enough, private enough, legible enough, reliable enough, cheap enough, and integrated enough that using it becomes natural.

The benchmark measures the visible product. The moat forms one layer out, in the surrounding package.

The Product and the Seat

People compare models as if users experience them as isolated intelligence engines. Most of the time they don't. They experience a bundle: interface, defaults, permissions, speed, memory, privacy, cost, and how much the tool asks them to rearrange their behaviour.

That bundle is where trust accumulates. It's also where switching costs quietly form.

A frontier model can win the benchmark and still lose the seat.

By "seat" I mean the position a product earns inside a user's actual workflow. The place where their context lives. The place they trust not to embarrass them, leak data, slow them down, or force them to relearn everything.

The product is the thing they evaluate. The seat is the thing they get used to living in. These are not the same asset.

You Can Already See It in Coding Tools

A model that is slightly worse on a public benchmark can still be the one people prefer if it lives inside the editor, sees the repo, responds instantly, keeps sensitive code local, and fits the way they already work.

This is also why "just use the best model" is often bad product advice. The best model according to a leaderboard may carry the wrong latency, the wrong privacy posture, the wrong integration burden, or the wrong failure mode for the actual job.

How Moats Actually Form

Apple is the familiar version of this dynamic outside AI. The moat was never just one visible feature. It was the package: ecosystem fit, convenience, defaults, identity, and the low-grade friction of leaving. The product got attention. The package became hard to leave.

I think a lot of AI products will work the same way. Moats usually form in the residue, not in the headline claim. In habit. In muscle memory. In stored context. In predictable behaviour. In the feeling that this tool understands how you work and doesn't make you pay a tax every time you use it.

What This Means for Builders

Local models don't need to win the intelligence race to matter. They can win a different race entirely: trust, control, governance, latency, privacy, and workflow fit.

Once a tool becomes the place where your context lives, your defaults settle, and your work starts to flow, a better benchmark somewhere else is not enough to dislodge it.

If I were building in this market, I'd treat that as a design rule. Don't ask only, "How do we make the model look stronger?" Ask:

Where does the user feel risk right now?
What part of the workflow still feels awkward or fragile?
What context would make the tool more useful after 30 days than on day 1?
What would make leaving this product feel expensive in a good way?

That is how you build the seat. Not by winning one benchmark snapshot, but by creating a package that compounds through use.