Forem

# gpu

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
20 Years of GPUs in Numbers: How FLOPS and TDP Grew, and Who Led the NVIDIA vs AMD Duel (+ open dataset of 13,500 GPUs)
Cover image for 20 Years of GPUs in Numbers: How FLOPS and TDP Grew, and Who Led the NVIDIA vs AMD Duel (+ open dataset of 13,500 GPUs)

20 Years of GPUs in Numbers: How FLOPS and TDP Grew, and Who Led the NVIDIA vs AMD Duel (+ open dataset of 13,500 GPUs)

Comments
7 min read
PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

Comments
3 min read
How to Detect GPU Waste in a Kubernetes Cluster
Cover image for How to Detect GPU Waste in a Kubernetes Cluster

How to Detect GPU Waste in a Kubernetes Cluster

Comments
5 min read
Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)
Cover image for Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Comments
5 min read
RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

Comments
3 min read
AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

Comments
3 min read
Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Comments
5 min read
HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked

HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked

Comments
11 min read
RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

1
Comments
4 min read
Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Comments
8 min read
HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution

HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution

Comments
5 min read
Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline

Comments
7 min read
Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap

Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap

Comments
7 min read
Profiling a CUDA Python Program with GPUFlight

Profiling a CUDA Python Program with GPUFlight

Comments
10 min read
LLM Compilers, GGUF Quantization, & Radeon RX 9060 Benchmarks

LLM Compilers, GGUF Quantization, & Radeon RX 9060 Benchmarks

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.