Forem

# benchmarks

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.

We Published That Our Premium Tier Failed on 60% of Tasks. Then We Fixed It.

Comments
3 min read
28 Real Tasks Reveal What AI Leaderboards Miss

28 Real Tasks Reveal What AI Leaderboards Miss

Comments
10 min read
Why I Wouldn't Act on SkillsBench
Cover image for Why I Wouldn't Act on SkillsBench

Why I Wouldn't Act on SkillsBench

Comments
5 min read
Komilion Balanced Tier Beats Opus 4.6 on 6 of 10 Developer Tasks at Half the Cost

Komilion Balanced Tier Beats Opus 4.6 on 6 of 10 Developer Tasks at Half the Cost

1
Comments
4 min read
How to Run an AI Benchmark That Doesn't Lie to You

How to Run an AI Benchmark That Doesn't Lie to You

Comments
4 min read
SurrealDB 3.0 benchmarks: a new foundation for performance
Cover image for SurrealDB 3.0 benchmarks: a new foundation for performance

SurrealDB 3.0 benchmarks: a new foundation for performance

15
Comments
36 min read
We Benchmarked 4 AI API Strategies With Real Money — The Results Changed How We Think About Model Selection

We Benchmarked 4 AI API Strategies With Real Money — The Results Changed How We Think About Model Selection

Comments
4 min read
How Do You Actually Compare LLMs? (The Battle Nobody's Talking About)
Cover image for How Do You Actually Compare LLMs? (The Battle Nobody's Talking About)

How Do You Actually Compare LLMs? (The Battle Nobody's Talking About)

3
Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.