Inference

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Cover image for BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

Thousand Miles AI

May 23

BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

#ai #llm #inference #opensource

3 min read

BossChaos

May 22

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

#ai #hardware #inference

2 min read

Cover image for Your AI speed benchmark is measuring the one workload you don't run

Thousand Miles AI

May 19

Your AI speed benchmark is measuring the one workload you don't run

#discuss #ai #llm #inference

3 min read

Aamer Mihaysi

May 15

Async Batching Is the Real Latency Win Nobody's Talking About

#llm #inference #async

3 min read

Jangwook Kim

May 11

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

#llmreasoning #agents #inference #arxiv2026

4 min read

Bruno Juca

May 10

Why Most Browser AI Demos Fail on Real Hardware

#ai #inference #hardware #benchmark

4 min read

David Aronchick

May 5

The Inference Inversion

#distributedcomputing #edgecomputing #nvidia #inference

7 min read

Steriani Karamanlis

May 12

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

#ai #llm #inference #pricing

4 min read

Cover image for Tutorial: This AI Now Tells You if a Meeting Could Be an Email

Andrew Dugan for DigitalOcean

May 21

Tutorial: This AI Now Tells You if a Meeting Could Be an Email

#ai #tutorial #agentskills #inference

8 min read

Cover image for Tutorial: Build a Cost-Aware AI Support Triage API

James Skelton for DigitalOcean

May 19

Tutorial: Build a Cost-Aware AI Support Triage API

#ai #tutorial #api #inference

13 min read

Cover image for Muse Spark beats Llama 4 with 10x less compute. Here's how.

Gabriel Anhaia

Apr 26

Muse Spark beats Llama 4 with 10x less compute. Here's how.

#ai #llm #architecture #inference

7 min read

Cover image for First Words: LLM Inference on RISC-V

Bruno Verachten

Apr 22

First Words: LLM Inference on RISC-V

#bananapi #benchmark #inference #llamacpp

9 min read

Cover image for Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Berkan Sesen

Apr 13

Gaussian Process Regression: The Bayesian Approach to Curve Fitting

#bayesian #supervisedlearning #probabilistic #inference

13 min read

Cover image for Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

Alan West

Apr 7

Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

#turboquant #locallm #inference #opensource

6 min read

Cover image for Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

Berkan Sesen

Apr 26

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

#bayesian #probabilistic #inference #pymc

13 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Forem

# inference

BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

Your AI speed benchmark is measuring the one workload you don't run

Async Batching Is the Real Latency Win Nobody's Talking About

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

Why Most Browser AI Demos Fail on Real Hardware

The Inference Inversion

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

Tutorial: This AI Now Tells You if a Meeting Could Be an Email

Tutorial: Build a Cost-Aware AI Support Triage API

Muse Spark beats Llama 4 with 10x less compute. Here's how.

First Words: LLM Inference on RISC-V

Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength