Forem

# inference

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090
Cover image for BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

Comments
3 min read
RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

Comments
2 min read
Your AI speed benchmark is measuring the one workload you don't run
Cover image for Your AI speed benchmark is measuring the one workload you don't run

Your AI speed benchmark is measuring the one workload you don't run

Comments
3 min read
Async Batching Is the Real Latency Win Nobody's Talking About

Async Batching Is the Real Latency Win Nobody's Talking About

1
Comments
3 min read
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

Comments
4 min read
Why Most Browser AI Demos Fail on Real Hardware

Why Most Browser AI Demos Fail on Real Hardware

Comments
4 min read
The Inference Inversion

The Inference Inversion

Comments
7 min read
First Confirmed Directional Move on the AI Inference Frontier Index in 2026

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

Comments
4 min read
Tutorial: This AI Now Tells You if a Meeting Could Be an Email
Cover image for Tutorial: This AI Now Tells You if a Meeting Could Be an Email

Tutorial: This AI Now Tells You if a Meeting Could Be an Email

3
Comments
8 min read
Tutorial: Build a Cost-Aware AI Support Triage API
Cover image for Tutorial: Build a Cost-Aware AI Support Triage API

Tutorial: Build a Cost-Aware AI Support Triage API

3
Comments 1
13 min read
Muse Spark beats Llama 4 with 10x less compute. Here's how.
Cover image for Muse Spark beats Llama 4 with 10x less compute. Here's how.

Muse Spark beats Llama 4 with 10x less compute. Here's how.

Comments
7 min read
First Words: LLM Inference on RISC-V
Cover image for First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Gaussian Process Regression: The Bayesian Approach to Curve Fitting
Cover image for Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Comments
13 min read
Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.
Cover image for Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

1
Comments
6 min read
Hierarchical Bayesian Regression with PyMC: When Groups Share Strength
Cover image for Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

1
Comments
13 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.