<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ferran Pons</title>
    <description>The latest articles on Forem by Ferran Pons (@ferranpons).</description>
    <link>https://forem.com/ferranpons</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F240839%2F683265e7-db5d-4e0b-b06c-0fac02efab3c.png</url>
      <title>Forem: Ferran Pons</title>
      <link>https://forem.com/ferranpons</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ferranpons"/>
    <language>en</language>
    <item>
      <title>How to Run LLMs Offline on Android Using Kotlin</title>
      <dc:creator>Ferran Pons</dc:creator>
      <pubDate>Wed, 28 Jan 2026 15:24:25 +0000</pubDate>
      <link>https://forem.com/ferranpons/how-to-run-llms-offline-on-android-using-kotlin-407g</link>
      <guid>https://forem.com/ferranpons/how-to-run-llms-offline-on-android-using-kotlin-407g</guid>
      <description>&lt;p&gt;Cloud-based LLMs are powerful, but they’re not always the right tool for mobile apps.&lt;/p&gt;

&lt;p&gt;They introduce:&lt;br&gt;
    • Network dependency&lt;br&gt;
    • Latency&lt;br&gt;
    • Usage-based costs&lt;br&gt;
    • Privacy concerns&lt;/p&gt;

&lt;p&gt;As Android developers, we already ship complex logic on-device.&lt;br&gt;
So the real question is:&lt;/p&gt;

&lt;p&gt;Can we run LLMs fully offline on Android, using Kotlin?&lt;/p&gt;

&lt;p&gt;Yes — and it’s surprisingly practical today.&lt;/p&gt;

&lt;p&gt;In this article, I’ll show how to run LLMs locally on Android using Kotlin, powered by llama.cpp and a Kotlin-first library called Llamatik.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why run LLMs offline on Android?
&lt;/h3&gt;

&lt;p&gt;Offline LLMs unlock use cases that cloud APIs struggle with:&lt;br&gt;
    • 📴 Offline-first apps&lt;br&gt;
    • 🔐 Privacy-preserving AI&lt;br&gt;
    • 📱 Predictable performance &amp;amp; cost&lt;br&gt;
    • ⚡ Tight UI integration&lt;/p&gt;

&lt;p&gt;Modern Android devices have:&lt;br&gt;
    • ARM CPUs with NEON&lt;br&gt;
    • Plenty of RAM (on mid/high-end devices)&lt;br&gt;
    • Fast local storage&lt;/p&gt;

&lt;p&gt;The challenge isn’t hardware — it’s tooling.&lt;/p&gt;
&lt;h3&gt;
  
  
  llama.cpp: the engine behind on-device LLMs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;llama.cpp&lt;/strong&gt; is a high-performance C++ runtime designed to run LLMs efficiently on CPUs.&lt;/p&gt;

&lt;p&gt;Why it’s ideal for Android:&lt;br&gt;
    • CPU-first (no GPU required)&lt;br&gt;
    • Supports quantized GGUF models&lt;br&gt;
    • Battle-tested across platforms&lt;/p&gt;

&lt;p&gt;The downside?&lt;br&gt;
It’s C++, and integrating it directly into Android apps is painful.&lt;/p&gt;

&lt;p&gt;That’s where &lt;strong&gt;Llamatik&lt;/strong&gt; comes in.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is Llamatik?
&lt;/h3&gt;

&lt;p&gt;Llamatik is a Kotlin-first library that wraps llama.cpp behind a clean Kotlin API.&lt;/p&gt;

&lt;p&gt;It’s designed for:&lt;br&gt;
    • Android&lt;br&gt;
    • Kotlin Multiplatform (iOS &amp;amp; Desktop)&lt;br&gt;
    • Fully offline inference&lt;/p&gt;

&lt;p&gt;Key features:&lt;br&gt;
    • No JNI in your app code&lt;br&gt;
    • GGUF model support&lt;br&gt;
    • Streaming &amp;amp; non-streaming generation&lt;br&gt;
    • Embeddings for offline RAG&lt;br&gt;
    • Kotlin Multiplatform–friendly API&lt;/p&gt;

&lt;p&gt;You write Kotlin — native complexity stays inside the library.&lt;/p&gt;
&lt;h3&gt;
  
  
  Add Llamatik to your Android project
&lt;/h3&gt;

&lt;p&gt;Llamatik is published on Maven Central.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dependencies {
    implementation("com.llamatik:library:0.12.0")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No custom Gradle plugins.&lt;br&gt;
No manual NDK setup.&lt;/p&gt;
&lt;h3&gt;
  
  
  Add a GGUF model
&lt;/h3&gt;

&lt;p&gt;Download a quantized GGUF model (Q4 or Q5 recommended) and place it in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;androidMain/assets/
└── phi-2.Q4_0.gguf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Quantized models are essential for mobile performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load the model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
LlamaBridge.initGenerateModel(modelPath)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This copies the model from assets and loads it into native memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generate text (fully offline)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;val response = LlamaBridge.generate(
    "Explain Kotlin Multiplatform in one sentence."
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No network.&lt;br&gt;
No API keys.&lt;br&gt;
No cloud calls.&lt;/p&gt;

&lt;p&gt;Everything runs on-device.&lt;/p&gt;
&lt;h3&gt;
  
  
  Streaming generation (for chat UIs)
&lt;/h3&gt;

&lt;p&gt;Streaming is critical for good UX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LlamaBridge.generateStreamWithContext(
    system = "You are a concise assistant.",
    context = "",
    user = "List three benefits of offline LLMs.",
    onDelta = { token -&amp;gt;
        // Append token to your UI
    },
    onDone = { },
    onError = { error -&amp;gt; }
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works naturally with:&lt;br&gt;
    • Jetpack Compose&lt;br&gt;
    • ViewModels&lt;br&gt;
    • StateFlow&lt;/p&gt;
&lt;h3&gt;
  
  
  Embeddings &amp;amp; offline RAG
&lt;/h3&gt;

&lt;p&gt;Llamatik also supports embeddings, enabling offline search and RAG use cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LlamaBridge.initModel(modelPath)
val embedding = LlamaBridge.embed("On-device AI with Kotlin")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Store embeddings locally and build fully offline AI features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance expectations
&lt;/h3&gt;

&lt;p&gt;On-device LLMs have limits — let’s be honest:&lt;br&gt;
    • Use small, quantized models&lt;br&gt;
    • Expect slower responses than cloud GPUs&lt;br&gt;
    • Manage memory carefully&lt;br&gt;
    • Always call shutdown() when done&lt;/p&gt;

&lt;p&gt;That said, for:&lt;br&gt;
    • Assistive features&lt;br&gt;
    • Short prompts&lt;br&gt;
    • Domain-specific tasks&lt;/p&gt;

&lt;p&gt;The performance is absolutely usable on modern devices.&lt;/p&gt;

&lt;h3&gt;
  
  
  When does this approach make sense?
&lt;/h3&gt;

&lt;p&gt;Llamatik is a great fit when you need:&lt;br&gt;
    • Offline support&lt;br&gt;
    • Strong privacy guarantees&lt;br&gt;
    • Predictable costs&lt;br&gt;
    • Tight UI integration&lt;/p&gt;

&lt;p&gt;It’s not meant to replace large cloud models — it’s edge AI done right.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;h3&gt;
  
  
  Try it yourself
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• GitHub: &lt;a href="https://github.com/ferranpons/llamatik" rel="noopener noreferrer"&gt;https://github.com/ferranpons/llamatik&lt;/a&gt;&lt;br&gt;
• Website &amp;amp; demo app: &lt;a href="https://llamatik.com" rel="noopener noreferrer"&gt;https://llamatik.com&lt;/a&gt;&lt;br&gt;
• llama.cpp: &lt;a href="https://github.com/ggml-org/llama.cpp" rel="noopener noreferrer"&gt;https://github.com/ggml-org/llama.cpp&lt;/a&gt;&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Final thoughts&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;Running LLMs offline on Android using Kotlin is no longer experimental.&lt;/p&gt;

&lt;p&gt;With the right abstractions, Kotlin developers can build private, offline, on-device AI — without touching C++.&lt;/p&gt;

&lt;p&gt;If you’re curious about pushing AI closer to the device, this is a great place to start.&lt;/p&gt;

</description>
      <category>android</category>
      <category>kotlin</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How to run your Monogame app on a Raspberry Pi (or any Linux)</title>
      <dc:creator>Ferran Pons</dc:creator>
      <pubDate>Mon, 25 Jan 2021 17:44:17 +0000</pubDate>
      <link>https://forem.com/ferranpons/how-to-run-your-monogame-app-on-a-raspberry-pi-or-any-linux-3clj</link>
      <guid>https://forem.com/ferranpons/how-to-run-your-monogame-app-on-a-raspberry-pi-or-any-linux-3clj</guid>
      <description>&lt;p&gt;If you are here you probably have a &lt;strong&gt;Windows&lt;/strong&gt; game developed using&lt;br&gt;
&lt;strong&gt;Monogame&lt;/strong&gt; that you would like to port to a &lt;strong&gt;Raspberry Pi&lt;/strong&gt; device with&lt;br&gt;
Raspberry Pi OS (&lt;em&gt;Raspbian&lt;/em&gt;). Or even, to any Linux distribution. Well, you are&lt;br&gt;
in the right place. This mini-tutorial will cover all the steps to run your game&lt;br&gt;
on it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Requirements
&lt;/h4&gt;

&lt;p&gt;Before starting to get your hands on the task you must comply with these&lt;br&gt;
requirements to maximize compatibility and to be up-to-date.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;*&lt;em&gt;Monogame 3.8 *&lt;/em&gt;(it could run on older versions but not tested)&lt;/li&gt;
&lt;li&gt;Your game using &lt;strong&gt;.Net Core 3&lt;/strong&gt; or &lt;em&gt;newer&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Your game and assets built with target &lt;strong&gt;DesktopGL&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raspberry Pi 2&lt;/strong&gt; or &lt;em&gt;newer (dotnet can publish only on newer devices and not
on the original RPi)&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  How to do it
&lt;/h4&gt;

&lt;p&gt;We are going to use our videogame Zombusters that is #&lt;em&gt;OpenSource&lt;/em&gt; as an example&lt;br&gt;
of a real project working.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Clone the Game Repository&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;git clone&lt;br&gt;
&lt;a href="https://github.com/retrowax/Zombusters.git" rel="noopener noreferrer"&gt;https://github.com/retrowax/Zombusters.git&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Download and install .NET Core 3.1 SDK&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this case, at the moment we are still using .NET Core 3.1 but it would be the&lt;br&gt;
same for the latest version 5.0. Here you will find the SDK:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://dotnet.microsoft.com/download/dotnet-core/3.1" rel="noopener noreferrer"&gt;https://dotnet.microsoft.com/download/dotnet-core/3.1&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We need to download the &lt;strong&gt;Arm32&lt;/strong&gt; version because Raspberry Pi OS is still&lt;br&gt;
&lt;strong&gt;32bit&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;wget&lt;br&gt;
&lt;a href="https://download.visualstudio.microsoft.com/download/pr/2178c8a1-ad48-4e51-9ddd-4e3ab64d1f0e/68746abefadf62be43ca525653c915a1/dotnet-sdk-3.1.405-linux-arm.tar.gz" rel="noopener noreferrer"&gt;https://download.visualstudio.microsoft.com/download/pr/2178c8a1-ad48-4e51-9ddd-4e3ab64d1f0e/68746abefadf62be43ca525653c915a1/dotnet-sdk-3.1.405-linux-arm.tar.gz&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now we need to uncompress the file and install it on our path:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;mkdir -p “$HOME/dotnet” &amp;amp;&amp;amp; tar zxf dotnet-sdk-3.1.405-linux-arm.tar.gz -C&lt;br&gt;
“$HOME/dotnet”&lt;/p&gt;

&lt;p&gt;export DOTNET_ROOT=$HOME/dotnet&lt;/p&gt;

&lt;p&gt;export PATH=$PATH:$HOME/dotnet&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you want &lt;strong&gt;.NET Core&lt;/strong&gt; to still work after restarting the system you would&lt;br&gt;
need to do this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;sudo vi /etc/profile&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Add these lines at the bottom of the file and save it, use your editor of choice&lt;br&gt;
I used vi.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;export DOTNET_ROOT=$HOME/dotnet&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;export PATH=$PATH:$HOME/dotnet&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Build the Game Solution&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now is time to build the solution but first, we need to download the required&lt;br&gt;
&lt;strong&gt;Nuget dependencies&lt;/strong&gt; included on the solution before we could build it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;dotnet restore ZombustersLinux.sln&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then, build the &lt;strong&gt;Debug&lt;/strong&gt; flavor:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;dotnet msbuild ZombustersLinux.sln&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you would like to make changes to your solution in order to adapt it or&lt;br&gt;
solve eventual issues with your solution being migrated to your Raspberry Pi.&lt;/p&gt;

&lt;p&gt;If the build runs without errors, it generates a &lt;strong&gt;dll&lt;/strong&gt; with the &lt;strong&gt;debug&lt;/strong&gt;&lt;br&gt;
build that could be executed with this command (the path is where your solution&lt;br&gt;
files were generated):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;dotnet&lt;br&gt;
/home/pi/Documents/github/Zombusters/ZombustersWindows/bin/Debug/netcoreapp3.1/ZombustersLinux.dll&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;em&gt;if is the first time migrating to a Linux environment it would be&lt;br&gt;
possible that your Content Load paths could be wrong and could generate errors&lt;br&gt;
when building.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And that’s it! You will have now your game up and running on your Raspberry Pi.&lt;/p&gt;

&lt;p&gt;This process could be used to execute it on other Linux distributions, you only&lt;br&gt;
need to download the correct arch for the .NET Core SDK.&lt;/p&gt;

&lt;p&gt;Finally, if you would like to try &lt;strong&gt;Zombusters&lt;/strong&gt; on your Raspberry Pi, you can&lt;br&gt;
download it for &lt;strong&gt;FREE&lt;/strong&gt; here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://retrowax.itch.io/zombusters-raspberry-pi-edition" rel="noopener noreferrer"&gt;https://retrowax.itch.io/zombusters-raspberry-pi-edition&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next posts, we will cover the ways to create a &lt;strong&gt;Release build&lt;/strong&gt; and the&lt;br&gt;
best options to &lt;strong&gt;distribute it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;

</description>
      <category>monogame</category>
      <category>gamedev</category>
      <category>raspberrypi</category>
      <category>csharp</category>
    </item>
  </channel>
</rss>
