🚀 Fixing Ollama Not Using GPU with Docker Desktop (Step-by-Step + Troubleshooting)

Foram Jaguwala — Sun, 29 Mar 2026 18:52:56 +0000

Running LLMs locally with Ollama is exciting… until you realize everything is running on CPU 😅

I recently ran into this exact issue — models were working, but GPU wasn’t being used at all.

Here’s how I fixed it using Docker Desktop with GPU support, along with the debugging steps that helped me understand the real problem.

🔴 The Problem

My initial setup:

Ollama installed locally ✅
Models running successfully ✅
GPU usage ❌

Result:

Slow responses
High CPU usage
Poor performance

🧠 Root Cause

After debugging, I realized:

The issue wasn’t entirely Ollama — it was how my local environment handled GPU access.

Even though the GPU was available, it wasn’t properly exposed to Ollama in my local setup.

However, the same GPU worked perfectly inside Docker, which confirmed that the environment played a major role.

🟢 The Solution: Docker Desktop + GPU

Instead of continuing to debug locally, I moved Ollama into a Docker container with GPU enabled.

This approach turned out to be much simpler and more reliable.

⚙️ Prerequisites

Make sure you have:

Docker Desktop installed
NVIDIA GPU (RTX / GTX)
Latest NVIDIA drivers
WSL2 enabled (for Windows users)

✅ Step 1: Verify GPU Access in Docker (Critical Step)

Before running Ollama, verify that Docker can access your GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

👉 If this command shows GPU details, your setup is correctly configured.

🐳 Step 2: Run Ollama with GPU

docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

⚡ Step 3: Run a Model

Option 1: Inside Container

docker exec -it ollama ollama run llama3.2:1b

Option 2: Using API (Recommended)

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: text/plain' \
--data '{
  "model": "llama3.2:1b",
  "prompt": "Hello"
}'

🔍 Step 4: Confirm GPU Usage

Open another terminal and run:

nvidia-smi

👉 You should see GPU memory usage increasing while the model is running.

⚡ Results

After switching to Docker:

🚀 Faster inference
🔥 GPU utilization working
🧠 Smooth local LLM experience

🛠️ Troubleshooting Guide

Here are some real issues I encountered:

❌ GPU Not Working in Docker

✅ Checklist:

nvidia-smi works on host
Docker Desktop is updated
WSL2 is enabled
Container is started with --gpus all

❌ GPU Works in Docker but Not in Ollama

✅ Fix:

Restart container
Re-run with GPU flag
Try a smaller model (e.g., mistral)

❌ `nvidia-smi` Not Found

Cause:

NVIDIA drivers are not installed

Fix:

Install latest drivers and reboot system

💡 Key Takeaways

GPU issues are often environment-related
Always verify GPU using a CUDA container first
Docker Desktop simplifies GPU access significantly
Running LLMs with GPU drastically improves performance

📌 Final Thoughts

If your Ollama setup is stuck on CPU:

👉 Don’t spend too much time debugging locally
👉 Try Docker with GPU support

It’s simple, reliable, and works consistently.

🙌 Need Help?

If you get stuck at any step, feel free to reach out — happy to help!

Forem: Foram Jaguwala