Forem

# cuda

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Microsecond Lie: Why your Go timers are lying about the GPU
Cover image for The Microsecond Lie: Why your Go timers are lying about the GPU

The Microsecond Lie: Why your Go timers are lying about the GPU

Comments
3 min read
Profiling a CUDA Python Program with GPUFlight

Profiling a CUDA Python Program with GPUFlight

Comments
10 min read
TensorRT `trt.Dims` SIGSEGV inside a GStreamer Python plugin — root cause and fix

TensorRT `trt.Dims` SIGSEGV inside a GStreamer Python plugin — root cause and fix

Comments
4 min read
Calling CUDA from Go without cgo
Cover image for Calling CUDA from Go without cgo

Calling CUDA from Go without cgo

1
Comments
2 min read
Why CUDA kernels silently corrupt memory and how to catch the bug
Cover image for Why CUDA kernels silently corrupt memory and how to catch the bug

Why CUDA kernels silently corrupt memory and how to catch the bug

Comments
5 min read
CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

Comments
4 min read
How I optimized a Solana vanity address grinder to 44M keys/sec on GPU

How I optimized a Solana vanity address grinder to 44M keys/sec on GPU

Comments
2 min read
From Black Magic to Science: The Evolution of the CUDA Optimization Skill

From Black Magic to Science: The Evolution of the CUDA Optimization Skill

Comments
11 min read
Learning Resources Tech

Learning Resources Tech

Comments
1 min read
512MiB 512MB — the silent trtexec bug

512MiB 512MB — the silent trtexec bug

Comments
2 min read
Memory Coalescing: Same computation, 6x Performance Difference

Memory Coalescing: Same computation, 6x Performance Difference

Comments
6 min read
Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04
Cover image for Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

1
Comments
3 min read
Achieving Neuro‑Sama‑Tier Speech‑to‑Text for Your Local AI Companion (Whisper + CUDA + LivinGrimoire)
Cover image for Achieving Neuro‑Sama‑Tier Speech‑to‑Text for Your Local AI Companion (Whisper + CUDA + LivinGrimoire)

Achieving Neuro‑Sama‑Tier Speech‑to‑Text for Your Local AI Companion (Whisper + CUDA + LivinGrimoire)

Comments
5 min read
CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

Comments
9 min read
124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

1
Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.