Forem: Ferran Pons

How to Run LLMs Offline on Android Using Kotlin

Ferran Pons — Wed, 28 Jan 2026 15:24:25 +0000

Cloud-based LLMs are powerful, but they’re not always the right tool for mobile apps.

They introduce:
• Network dependency
• Latency
• Usage-based costs
• Privacy concerns

As Android developers, we already ship complex logic on-device.
So the real question is:

Can we run LLMs fully offline on Android, using Kotlin?

Yes — and it’s surprisingly practical today.

In this article, I’ll show how to run LLMs locally on Android using Kotlin, powered by llama.cpp and a Kotlin-first library called Llamatik.

Why run LLMs offline on Android?

Offline LLMs unlock use cases that cloud APIs struggle with:
• 📴 Offline-first apps
• 🔐 Privacy-preserving AI
• 📱 Predictable performance & cost
• ⚡ Tight UI integration

Modern Android devices have:
• ARM CPUs with NEON
• Plenty of RAM (on mid/high-end devices)
• Fast local storage

The challenge isn’t hardware — it’s tooling.

llama.cpp: the engine behind on-device LLMs

llama.cpp is a high-performance C++ runtime designed to run LLMs efficiently on CPUs.

Why it’s ideal for Android:
• CPU-first (no GPU required)
• Supports quantized GGUF models
• Battle-tested across platforms

The downside?
It’s C++, and integrating it directly into Android apps is painful.

That’s where Llamatik comes in.

What is Llamatik?

Llamatik is a Kotlin-first library that wraps llama.cpp behind a clean Kotlin API.

It’s designed for:
• Android
• Kotlin Multiplatform (iOS & Desktop)
• Fully offline inference

Key features:
• No JNI in your app code
• GGUF model support
• Streaming & non-streaming generation
• Embeddings for offline RAG
• Kotlin Multiplatform–friendly API

You write Kotlin — native complexity stays inside the library.

Add Llamatik to your Android project

Llamatik is published on Maven Central.

dependencies {
    implementation("com.llamatik:library:0.12.0")
}

No custom Gradle plugins.
No manual NDK setup.

Add a GGUF model

Download a quantized GGUF model (Q4 or Q5 recommended) and place it in:

androidMain/assets/
└── phi-2.Q4_0.gguf

Quantized models are essential for mobile performance.

Load the model

val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
LlamaBridge.initGenerateModel(modelPath)

This copies the model from assets and loads it into native memory.

Generate text (fully offline)

val response = LlamaBridge.generate(
    "Explain Kotlin Multiplatform in one sentence."
)

No network.
No API keys.
No cloud calls.

Everything runs on-device.

Streaming generation (for chat UIs)

Streaming is critical for good UX.

LlamaBridge.generateStreamWithContext(
    system = "You are a concise assistant.",
    context = "",
    user = "List three benefits of offline LLMs.",
    onDelta = { token ->
        // Append token to your UI
    },
    onDone = { },
    onError = { error -> }
)

This works naturally with:
• Jetpack Compose
• ViewModels
• StateFlow

Embeddings & offline RAG

Llamatik also supports embeddings, enabling offline search and RAG use cases.

LlamaBridge.initModel(modelPath)
val embedding = LlamaBridge.embed("On-device AI with Kotlin")

Store embeddings locally and build fully offline AI features.

Performance expectations

On-device LLMs have limits — let’s be honest:
• Use small, quantized models
• Expect slower responses than cloud GPUs
• Manage memory carefully
• Always call shutdown() when done

That said, for:
• Assistive features
• Short prompts
• Domain-specific tasks

The performance is absolutely usable on modern devices.

When does this approach make sense?

Llamatik is a great fit when you need:
• Offline support
• Strong privacy guarantees
• Predictable costs
• Tight UI integration

It’s not meant to replace large cloud models — it’s edge AI done right.

⸻

Try it yourself

• GitHub: https://github.com/ferranpons/llamatik

• Website & demo app: https://llamatik.com

• llama.cpp: https://github.com/ggml-org/llama.cpp

Final thoughts

Running LLMs offline on Android using Kotlin is no longer experimental.

With the right abstractions, Kotlin developers can build private, offline, on-device AI — without touching C++.

If you’re curious about pushing AI closer to the device, this is a great place to start.

How to run your Monogame app on a Raspberry Pi (or any Linux)

Ferran Pons — Mon, 25 Jan 2021 17:44:17 +0000

If you are here you probably have a Windows game developed using
Monogame that you would like to port to a Raspberry Pi device with
Raspberry Pi OS (Raspbian). Or even, to any Linux distribution. Well, you are
in the right place. This mini-tutorial will cover all the steps to run your game
on it.

Requirements

Before starting to get your hands on the task you must comply with these
requirements to maximize compatibility and to be up-to-date.

*Monogame 3.8 *(it could run on older versions but not tested)
Your game using .Net Core 3 or newer
Your game and assets built with target DesktopGL
Raspberry Pi 2 or newer (dotnet can publish only on newer devices and not on the original RPi)

How to do it

We are going to use our videogame Zombusters that is #OpenSource as an example
of a real project working.

Clone the Game Repository

git clone
https://github.com/retrowax/Zombusters.git

Download and install .NET Core 3.1 SDK

In this case, at the moment we are still using .NET Core 3.1 but it would be the
same for the latest version 5.0. Here you will find the SDK:

https://dotnet.microsoft.com/download/dotnet-core/3.1

We need to download the Arm32 version because Raspberry Pi OS is still
32bit.

wget
https://download.visualstudio.microsoft.com/download/pr/2178c8a1-ad48-4e51-9ddd-4e3ab64d1f0e/68746abefadf62be43ca525653c915a1/dotnet-sdk-3.1.405-linux-arm.tar.gz

Now we need to uncompress the file and install it on our path:

mkdir -p “$HOME/dotnet” && tar zxf dotnet-sdk-3.1.405-linux-arm.tar.gz -C
“$HOME/dotnet”

export DOTNET_ROOT=$HOME/dotnet

export PATH=$PATH:$HOME/dotnet

If you want .NET Core to still work after restarting the system you would
need to do this:

sudo vi /etc/profile

Add these lines at the bottom of the file and save it, use your editor of choice
I used vi.

export DOTNET_ROOT=$HOME/dotnet

export PATH=$PATH:$HOME/dotnet

Build the Game Solution

Now is time to build the solution but first, we need to download the required
Nuget dependencies included on the solution before we could build it.

dotnet restore ZombustersLinux.sln

Then, build the Debug flavor:

dotnet msbuild ZombustersLinux.sln

Now you would like to make changes to your solution in order to adapt it or
solve eventual issues with your solution being migrated to your Raspberry Pi.

If the build runs without errors, it generates a dll with the debug
build that could be executed with this command (the path is where your solution
files were generated):

dotnet
/home/pi/Documents/github/Zombusters/ZombustersWindows/bin/Debug/netcoreapp3.1/ZombustersLinux.dll

Note: if is the first time migrating to a Linux environment it would be
possible that your Content Load paths could be wrong and could generate errors
when building.

And that’s it! You will have now your game up and running on your Raspberry Pi.

This process could be used to execute it on other Linux distributions, you only
need to download the correct arch for the .NET Core SDK.

Finally, if you would like to try Zombusters on your Raspberry Pi, you can
download it for FREE here:

https://retrowax.itch.io/zombusters-raspberry-pi-edition

In the next posts, we will cover the ways to create a Release build and the
best options to distribute it.

Stay tuned!