DEV Community

Ricardo
Ricardo

Posted on • Originally published at rmauro.dev on

Running LLM llama.cpp Bare Metal on Raspberry Pi

Running LLM llama.cpp Natively on Raspberry Pi

For developers and hackers who enjoy squeezing maximum potential out of compact machines, getting a large language model like llama.cpp running natively on a Raspberry Pi is a rewarding challenge. This guide walks you through compiling llama.cpp from source, downloading a model, and running inference - all on the Pi itself.

Prerequisites

Hardware

  • Raspberry Pi 4, 5, or newer
  • 64-bit Raspberry Pi OS
  • 4GB RAM minimum (8GB+ recommended)
  • Heatsink or fan recommended for cooling

Software

  • Git
  • CMake (v3.16+)
  • GCC or Clang
  • Python 3 (optional, for Python bindings)

Step-by-Step Guide

Install required tools

sudo apt update && sudo apt upgrade -y

# 👇 install dependencies and tools to build
sudo apt install -y git build-essential cmake python3-pip libcurl4-openssl-dev
Enter fullscreen mode Exit fullscreen mode

Clone and Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git

cd llama.cpp

cmake -B build
cmake --build build --config Release -j$(nproc)
Enter fullscreen mode Exit fullscreen mode

This step takes sometime. Here we're compiling llama-cpp software.

Download a Quantized Model

mkdir -p models && cd models

wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf

cd ..
Enter fullscreen mode Exit fullscreen mode

Let's use the model https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF for testing.

4. Run Inference

./build/bin/llama-cli \
  -m ./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf \
  -p "Hello, Raspberry Pi!"
Enter fullscreen mode Exit fullscreen mode

Optional: Python Bindings

Note: The Python bindings have been moved to a separate repository.

git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
python3 -m pip install -r requirements.txt
python3 -m pip install .
Enter fullscreen mode Exit fullscreen mode

Use in Python:

# Use in Python:

from llama_cpp import Llama
llm = Llama(model_path="./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf")
print(llm("Hello from Python!"))
Enter fullscreen mode Exit fullscreen mode

Conclusion

Running llama.cpp natively on a Raspberry Pi is a geeky thrill. It teaches you about compiler optimizations, quantized models, and pushing hardware to the edge—literally. Bonus points if you run it headless over SSH.

Top comments (0)

Feature flag article image

Create a feature flag in your IDE in 5 minutes with LaunchDarkly’s MCP server 🏁

How to create, evaluate, and modify flags from within your IDE or AI client using natural language with LaunchDarkly's new MCP server. Follow along with this tutorial for step by step instructions.

Read full post

AssemblyAI Challenge (ends July 27, $3k prizes)

Build with Universal-Streaming, AssemblyAI's most advanced real-time transcription API and win from our $3,000 prize pool! Universal-Streaming is ultra fast (300ms latency!), ultra accurate, and offers intelligent endpointing to keep conversations flowing naturally.

Check out the challenge

DEV is bringing live events to the community. Dismiss if you're not interested. ❤️