Forem

Marcus Chen profile picture

Marcus Chen

Senior ML Engineer based in Austin. I write about ML evaluation, fine-tuning, and why your evals are probably lying to you. The model is the easy part.

Joined Joined on 
Prefix caching in vLLM under multi-tenant agent traffic

Prefix caching in vLLM under multi-tenant agent traffic

Comments
4 min read
We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage.

We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage.

Comments
4 min read
Why Your LLM Eval Harness Is Lying to You (And How to Fix It)

Why Your LLM Eval Harness Is Lying to You (And How to Fix It)

Comments
4 min read
Measuring AI Gateway Failover: 30 Days of Production Data

Measuring AI Gateway Failover: 30 Days of Production Data

Comments
3 min read
What Gemma 4's multi-token prediction head actually means for your eval pipeline

What Gemma 4's multi-token prediction head actually means for your eval pipeline

3
Comments
5 min read
Mastering Local AI Agents for Everyday Programming in 2026

Mastering Local AI Agents for Everyday Programming in 2026

Comments
2 min read
loading...