<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Carol Bolger</title>
    <description>The latest articles on Forem by Carol Bolger (@bolgercarol).</description>
    <link>https://forem.com/bolgercarol</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F108863%2F7777cd75-1957-4fee-8741-88cacc6ec39c.png</url>
      <title>Forem: Carol Bolger</title>
      <link>https://forem.com/bolgercarol</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bolgercarol"/>
    <language>en</language>
    <item>
      <title>On-device or cloud? Building hybrid AI inference into your Android app with Firebase AI Logic</title>
      <dc:creator>Carol Bolger</dc:creator>
      <pubDate>Wed, 29 Apr 2026 22:12:36 +0000</pubDate>
      <link>https://forem.com/bolgercarol/on-device-or-cloud-building-hybrid-ai-inference-into-your-android-app-with-firebase-ai-logic-3p1i</link>
      <guid>https://forem.com/bolgercarol/on-device-or-cloud-building-hybrid-ai-inference-into-your-android-app-with-firebase-ai-logic-3p1i</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-cloud-next-2026-04-22"&gt;Google Cloud NEXT Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Not every prompt needs the cloud
&lt;/h2&gt;

&lt;p&gt;Every time a user taps a button in your Android app and Gemini responds, something happens in the background you might not think about: a round trip to Google's servers. Data leaves the device, gets processed in the cloud, and comes back. For most prompts, that's fine. But what about a health journaling app where the prompt contains symptoms? A notes app where the query is someone's private thought? Or a user on a shaky connection in the middle of nowhere?&lt;/p&gt;

&lt;p&gt;The assumption baked into most AI-powered apps is that inference lives in the cloud. That made sense when on-device models were too limited to be useful. That assumption is now worth revisiting.&lt;/p&gt;

&lt;p&gt;At Google Cloud Next '26, Firebase announced hybrid inference for Firebase AI Logic on Android — currently experimental, powered by Gemini Nano via ML Kit's Prompt API under the hood. The idea is straightforward: run inference locally on the device when it can handle it, and fall back to cloud-hosted Gemini when it can't. From your Kotlin code, the API looks nearly identical either way. You configure a preference, write a prompt, get a response. The SDK figures out the routing.&lt;/p&gt;

&lt;p&gt;This article walks through how hybrid inference works, why it matters for Android developers, and how to wire it into a real app today — including the honest caveats you won't find in the announcement post.&lt;/p&gt;




&lt;h2&gt;
  
  
  How hybrid inference works
&lt;/h2&gt;

&lt;p&gt;At its core, hybrid inference is a routing decision the SDK makes on your behalf. When your app sends a prompt, Firebase AI Logic checks whether the device supports Gemini Nano and has sufficient resources to run it. If yes, inference runs locally — never leaving the device. If not, the request is forwarded to cloud-hosted Gemini, exactly as it would in a standard setup.&lt;/p&gt;

&lt;p&gt;The routing behaviour is controlled by four &lt;code&gt;InferenceMode&lt;/code&gt; values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PREFER_ON_DEVICE&lt;/code&gt;&lt;/strong&gt; — try on-device first, fall back to cloud if unavailable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PREFER_IN_CLOUD&lt;/code&gt;&lt;/strong&gt; — try cloud first, fall back to on-device if offline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ONLY_ON_DEVICE&lt;/code&gt;&lt;/strong&gt; — on-device only; throws an exception if unavailable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ONLY_IN_CLOUD&lt;/code&gt;&lt;/strong&gt; — cloud only; throws an exception if offline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most apps &lt;code&gt;PREFER_ON_DEVICE&lt;/code&gt; is the right starting point. It gives you the speed and privacy benefits of local inference where available, without ever leaving users stranded.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it like a CDN for model inference. The SDK picks the most capable compute available at the moment the request is made — and your Kotlin code stays the same regardless of which path runs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Under the hood, on-device inference uses ML Kit's Prompt API running Gemini Nano — a smaller, quantized model optimized for mobile hardware, not the full cloud Gemini model. That trade-off is worth being upfront about: on-device responses will be faster and free of API cost, but for complex reasoning tasks, cloud Gemini produces higher quality output. Hybrid inference gives the SDK the ability to make that call at runtime rather than forcing you to hardcode it at build time.&lt;/p&gt;

&lt;p&gt;Three reasons this matters for Android developers in particular:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed.&lt;/strong&gt; On-device inference removes the network round trip entirely. For short prompts on capable hardware, responses feel near-instant — which opens up UX patterns that feel too slow over the network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy.&lt;/strong&gt; When inference runs locally, the prompt never leaves the device. For apps handling sensitive input — health data, personal notes, financial details — that's a meaningful architectural guarantee, not just a talking point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; Every request handled on-device is a Gemini API call you're not making. At scale, or in apps with high inference volume, this adds up — especially for short, repetitive prompts.&lt;/p&gt;

&lt;p&gt;One important note: the on-device model isn't bundled with your APK — it downloads in the background after first launch via ML Kit. Until it's cached, all requests fall back to cloud. For apps where on-device is a hard requirement rather than a preference, you'll want to gate the feature on download completion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building MemoMind: record, summarize, save
&lt;/h2&gt;

&lt;p&gt;We're going to build a focused Android app called MemoMind that records a voice memo, transcribes it using Android's built-in speech recognizer, runs a Gemini summary using hybrid inference, and saves everything to Firestore — including which backend (on-device or cloud) handled the request. By the end you'll have something genuinely usable, not just a proof of concept.&lt;/p&gt;

&lt;p&gt;The full flow: the user taps record, speaks, taps stop. The app transcribes the audio using &lt;code&gt;SpeechRecognizer&lt;/code&gt;, sends the transcript to Firebase AI Logic with hybrid inference configured, parses the structured JSON response into a summary and action items, then writes the result to Firestore.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;The full source for MemoMind is on GitHub at &lt;strong&gt;&lt;a href="https://github.com/RealWorldApplications/memo-mind-post" rel="noopener noreferrer"&gt;https://github.com/RealWorldApplications/memo-mind-post&lt;/a&gt;&lt;/strong&gt;. You can clone it and follow along, or build from scratch using the steps below.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tested with:&lt;/strong&gt; &lt;code&gt;firebase-ai:17.11.0&lt;/code&gt;, &lt;code&gt;firebase-ai-ondevice:16.0.0-beta01&lt;/code&gt; on a Pixel 9 running Android 15. Import paths and model names for experimental APIs can shift between releases — treat the GitHub repo as the authoritative reference if anything here doesn't compile.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You'll need a Firebase project with the Gemini Developer API enabled under Build → AI Logic, and Firestore enabled. If you're starting from scratch, the &lt;a href="https://firebase.google.com/docs/android/setup" rel="noopener noreferrer"&gt;Firebase Android setup guide&lt;/a&gt; covers the project creation steps.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; After cloning, add your own &lt;code&gt;google-services.json&lt;/code&gt; to the &lt;code&gt;app/&lt;/code&gt; directory — it's gitignored in the repo. Download it from your Firebase project console under Project settings → Your apps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 1 — Add dependencies
&lt;/h3&gt;

&lt;p&gt;Hybrid inference requires two separate libraries — the standard Firebase AI library and the on-device extension. Note that &lt;strong&gt;&lt;code&gt;firebase-ai-ondevice&lt;/code&gt; is not yet in the Firebase Android BoM&lt;/strong&gt;, so you need to pin the version explicitly.&lt;/p&gt;

&lt;p&gt;You also need the Kotlin serialization plugin (used in &lt;code&gt;MemoService&lt;/code&gt; for JSON parsing) and &lt;code&gt;material-icons-extended&lt;/code&gt; for the &lt;code&gt;Mic&lt;/code&gt; and &lt;code&gt;Stop&lt;/code&gt; icons — both are easy to miss:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// build.gradle.kts (app level)&lt;/span&gt;
&lt;span class="nf"&gt;plugins&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ... your existing plugins ...&lt;/span&gt;
    &lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.jetbrains.kotlin.plugin.serialization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="s"&gt;"2.0.21"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;dependencies&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Standard Firebase AI Logic&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.google.firebase:firebase-ai:17.11.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// On-device extension — NOT in the BoM yet, pin manually&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.google.firebase:firebase-ai-ondevice:16.0.0-beta01"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Firestore&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.google.firebase:firebase-firestore:25.1.1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Compose&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.compose:compose-bom:2024.09.00"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.compose.ui:ui"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.compose.material3:material3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.compose.material:material-icons-extended"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// Mic/Stop icons&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.activity:activity-compose:1.9.2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Coroutines&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"androidx.lifecycle:lifecycle-viewmodel-compose:2.8.6"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// JSON parsing — used in MemoService&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.jetbrains.kotlinx:kotlinx-serialization-json:1.7.3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also add microphone permission to &lt;code&gt;AndroidManifest.xml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;uses-permission&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;"android.permission.RECORD_AUDIO"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2 — Define the data model
&lt;/h3&gt;

&lt;p&gt;Before any AI or UI code, define a clean data model. Gemini will return a structured JSON response that maps directly to this class, and Firestore will store it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;Memo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;inferredBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// "on_device" or "cloud"&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;currentTimeMillis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;toMap&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"transcript"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"summary"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"actionItems"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"inferredBy"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;inferredBy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"createdAt"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;createdAt&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;inferredBy&lt;/code&gt; field is what powers the backend indicator chip in the UI. We'll populate it based on which path the SDK actually took.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Configure Firebase AI Logic with hybrid inference
&lt;/h3&gt;

&lt;p&gt;This is the heart of the tutorial. The &lt;code&gt;@OptIn&lt;/code&gt; annotation is required because hybrid inference is currently experimental. Beyond that, notice how little the setup differs from standard Firebase AI — the only meaningful addition is &lt;code&gt;onDeviceConfig&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.Firebase&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.ai.ai&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.ai.InferenceMode&lt;/span&gt;        &lt;span class="c1"&gt;// not .ondevice — re-exported to root package&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.ai.OnDeviceConfig&lt;/span&gt;        &lt;span class="c1"&gt;// not .ondevice — re-exported to root package&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.ai.type.GenerativeBackend&lt;/span&gt; &lt;span class="c1"&gt;// lives in .type, not root&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.google.firebase.ai.type.PublicPreviewAPI&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.json.Json&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.json.jsonArray&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.json.jsonObject&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.json.jsonPrimitive&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="c1"&gt;// Experimental opt-in required for hybrid inference&lt;/span&gt;
    &lt;span class="nd"&gt;@OptIn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PublicPreviewAPI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Firebase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerativeBackend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;googleAI&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;modelName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gemini-3-flash-preview"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;onDeviceConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OnDeviceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InferenceMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PREFER_ON_DEVICE&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@OptIn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PublicPreviewAPI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;SummaryResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
            You are a memo summarizer. Given the transcript below, respond ONLY
            with a valid JSON object — no markdown, no explanation. Use this shape:
            {
              "summary": "2-sentence summary of the memo",
              "actionItems": ["item one", "item two"],
              "inferredBy": "on_device"
            }

            Set "inferredBy" to "on_device" if you are running locally,
            or "cloud" if you are a cloud-hosted model.

            Transcript:
            $transcript
        """&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimIndent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;parseResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;parseResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;SummaryResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;cleaned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"```
&lt;/span&gt;&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;
&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="s"&gt;", "")
&lt;/span&gt;                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"
&lt;/span&gt;&lt;span class="p"&gt;{%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;%}&lt;/span&gt;
&lt;span class="err"&gt;```&lt;/span&gt;&lt;span class="s"&gt;", "")
&lt;/span&gt;                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;json&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseToJsonElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;jsonObject&lt;/span&gt;
            &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;jsonPrimitive&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;actionItems&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"actionItems"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;jsonArray&lt;/span&gt;
                    &lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonPrimitive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"inferredBy"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;jsonPrimitive&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;"cloud"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actionItems&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"cloud"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;inferredBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;companion&lt;/span&gt; &lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SummaryResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;actionItems&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"cloud"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A note on inferredBy:&lt;/strong&gt; The Firebase AI Logic Android SDK doesn't yet expose a property to read which backend handled a response after the fact. As a practical workaround, we ask the model to self-report in the JSON. On-device Gemini Nano will reliably report &lt;code&gt;"on_device"&lt;/code&gt; — it's a factual question about its own execution context. Verify this with your own testing and adjust if needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 4 — Transcribe with Android's SpeechRecognizer
&lt;/h3&gt;

&lt;p&gt;Android's built-in &lt;code&gt;SpeechRecognizer&lt;/code&gt; handles transcription entirely on-device using the platform's native speech engine. No third-party package needed, and audio never leaves the phone at this stage.&lt;/p&gt;

&lt;p&gt;The key pattern is wrapping &lt;code&gt;SpeechRecognizer&lt;/code&gt;'s listener callbacks in &lt;code&gt;suspendCancellableCoroutine&lt;/code&gt; so they integrate cleanly with the coroutine-based ViewModel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TranscriptionService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;suspendCancellableCoroutine&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;recognizer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SpeechRecognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createSpeechRecognizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;intent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RecognizerIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ACTION_RECOGNIZE_SPEECH&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;putExtra&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RecognizerIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;EXTRA_LANGUAGE_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RecognizerIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LANGUAGE_MODEL_FREE_FORM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;putExtra&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RecognizerIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;EXTRA_MAX_RESULTS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRecognitionListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="err"&gt;: &lt;/span&gt;&lt;span class="nc"&gt;RecognitionListener&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Bundle&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;transcript&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
                    &lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;getStringArrayList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SpeechRecognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RESULTS_RECOGNITION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;firstOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
                &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;destroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resumeWithException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Speech recognition error: $error"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;destroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;// RecognitionListener requires several no-op overrides — see full source on GitHub&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startListening&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invokeOnCancellation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;destroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SpeechRecognizer&lt;/code&gt; must be created and used on the main thread. Because &lt;code&gt;viewModelScope&lt;/code&gt; runs on &lt;code&gt;Dispatchers.Main&lt;/code&gt;, calling &lt;code&gt;transcribe()&lt;/code&gt; from the ViewModel is safe without any explicit dispatcher switch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 — ViewModel: wire it all together
&lt;/h3&gt;

&lt;p&gt;The ViewModel exposes a &lt;code&gt;MemoUiState&lt;/code&gt; sealed class (&lt;code&gt;Idle&lt;/code&gt;, &lt;code&gt;Recording&lt;/code&gt;, &lt;code&gt;Processing&lt;/code&gt;, &lt;code&gt;Done&lt;/code&gt;, &lt;code&gt;Error&lt;/code&gt;) as a &lt;code&gt;StateFlow&lt;/code&gt; and orchestrates the three-stage pipeline in &lt;code&gt;startRecording()&lt;/code&gt;. The full class including the &lt;code&gt;ViewModelProvider.Factory&lt;/code&gt; is in &lt;a href="//[YOUR_GITHUB_LINK]/blob/main/app/src/main/java/com/example/memomind/MemoViewModel.kt"&gt;&lt;code&gt;MemoViewModel.kt&lt;/code&gt;&lt;/a&gt; — the pipeline itself is the part worth reading here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;startRecording&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;_uiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoUiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Recording&lt;/span&gt;

    &lt;span class="n"&gt;viewModelScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;transcript&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transcriptionService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// on-device speech engine&lt;/span&gt;
            &lt;span class="n"&gt;_uiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoUiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Processing&lt;/span&gt;

            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memoService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// hybrid inference&lt;/span&gt;

            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memo&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Memo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;firestore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memos"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;document&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;transcript&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;actionItems&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inferredBy&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;firestore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memos"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toMap&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;await&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;_uiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoUiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;_uiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoUiState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;"Unknown error"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three lines do the meaningful work: &lt;code&gt;transcribe()&lt;/code&gt;, &lt;code&gt;summarize()&lt;/code&gt;, and &lt;code&gt;set()&lt;/code&gt;. Everything else is state management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 — Compose UI with the backend indicator chip
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;MemoScreen&lt;/code&gt; collects &lt;code&gt;uiState&lt;/code&gt; and renders a &lt;code&gt;FilledIconButton&lt;/code&gt; that toggles between mic and stop, a status label, a &lt;code&gt;CircularProgressIndicator&lt;/code&gt; during summarization, and the &lt;code&gt;SummaryCard&lt;/code&gt; once done. The full screen composable is in &lt;a href="//[YOUR_GITHUB_LINK]/blob/main/app/src/main/java/com/example/memomind/ui/MemoScreen.kt"&gt;&lt;code&gt;MemoScreen.kt&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The part worth looking at closely is &lt;code&gt;SummaryCard&lt;/code&gt; — specifically the backend indicator chip, which is the whole point of exposing &lt;code&gt;inferredBy&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Composable&lt;/span&gt;
&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;SummaryCard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Memo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;onReset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Unit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillMaxWidth&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoundedCornerShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

            &lt;span class="c1"&gt;// Summary header + backend chip&lt;/span&gt;
            &lt;span class="nc"&gt;Row&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillMaxWidth&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;verticalAlignment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Alignment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CenterVertically&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;titleMedium&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="c1"&gt;// Backend indicator chip&lt;/span&gt;
                &lt;span class="nc"&gt;Surface&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoundedCornerShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"on_device"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="nc"&gt;Color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xFFE1F5EE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt;
                        &lt;span class="nc"&gt;Color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xFFE6F1FB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"on_device"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="s"&gt;"On-device"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s"&gt;"Cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;horizontal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vertical&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labelSmall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;color&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inferredBy&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"on_device"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                            &lt;span class="nc"&gt;Color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xFF085041&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="k"&gt;else&lt;/span&gt;
                            &lt;span class="nc"&gt;Color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xFF0C447C&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nc"&gt;Spacer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;height&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bodyMedium&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;lineHeight&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bodyMedium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineHeight&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Spacer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;height&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Action items"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;titleMedium&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nc"&gt;Spacer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;height&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;actionItems&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
                    &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"• $item"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MaterialTheme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;typography&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bodyMedium&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertical&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nc"&gt;Spacer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;height&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nc"&gt;OutlinedButton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;onClick&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;onReset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;modifier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Modifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillMaxWidth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Record another"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7 — Run it and watch the chip
&lt;/h3&gt;

&lt;p&gt;Build and run on a physical Android device that supports Gemini Nano — a Pixel 6 or newer is a safe bet. Record a short memo, something like: &lt;em&gt;"remind me to email Sarah about the Q2 report and book a dentist appointment."&lt;/em&gt; Stop and watch the card appear.&lt;/p&gt;

&lt;p&gt;On the first run you'll likely see "Cloud" — the on-device Gemini Nano model downloads in the background after first use. Record a second memo and you should see the chip flip to "On-device", with a noticeably faster response time.&lt;/p&gt;

&lt;p&gt;That chip is the screenshot worth capturing for your article header. A real summary card showing "On-device" with action items is worth more than any diagram.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this actually means for Android developers
&lt;/h2&gt;

&lt;p&gt;After building MemoMind, a few things stand out — both as genuine reasons to be excited about hybrid inference and as honest caveats worth knowing before you architect something around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What works well
&lt;/h3&gt;

&lt;p&gt;The API design is the real win here. Firebase's decision to wrap hybrid routing in &lt;code&gt;OnDeviceConfig&lt;/code&gt; rather than forcing you to maintain two separate model instances means you're not writing conditional execution paths throughout your codebase. The SDK absorbs the routing complexity. For a feature that's still experimental, the ergonomics are surprisingly clean.&lt;/p&gt;

&lt;p&gt;The privacy story is also more meaningful than it might first appear. When inference runs locally, it's not just that data doesn't get logged somewhere — it's that the architecture of your app changes. You can make stronger guarantees to users, design features that handle genuinely sensitive input, and avoid the legal grey areas that come with sending personal data to a third-party API. For anyone building in health, fitness, or productivity, that's a real design unlock.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The backend indicator chip in MemoMind isn't just a demo trick. In a production app, surfacing this to users — even subtly — builds a kind of trust that's hard to communicate through a privacy policy alone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Current limitations worth knowing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;It's experimental.&lt;/strong&gt; The &lt;code&gt;@OptIn(PublicPreviewAPI::class)&lt;/code&gt; annotation isn't just boilerplate — it's a signal that the API surface can change in backwards-incompatible ways without deprecation notice. Don't build a production release around this without a contingency plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini Nano capability gap.&lt;/strong&gt; The on-device model is smaller and quantized. For MemoMind's use case — short transcripts, structured JSON output — it performs well. For complex reasoning, longer context, or nuanced instruction-following, you'll notice the quality gap compared to cloud Gemini. Know your prompt's complexity profile before relying on on-device quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model download on first run.&lt;/strong&gt; Gemini Nano downloads in the background after first launch via ML Kit. Until it's cached, every request goes to cloud. For apps where on-device is a hard privacy requirement rather than a preference, you'll need to listen to the model download state and gate the feature accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device support is not universal.&lt;/strong&gt; On-device inference requires a device that supports Gemini Nano. Pixel 6 and newer, and a growing range of Samsung devices, qualify — but this is not all Android devices. Your &lt;code&gt;PREFER_ON_DEVICE&lt;/code&gt; config will silently fall back to cloud on unsupported hardware, which is fine for most cases but worth tracking in your analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-reporting inferredBy is a workaround.&lt;/strong&gt; Asking the model to report its own execution context in the JSON works in practice but isn't a guaranteed contract. The official SDK doesn't yet expose a post-response property for which backend ran. Watch the &lt;code&gt;firebase-ai-ondevice&lt;/code&gt; changelog for when this is added properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The bigger picture
&lt;/h3&gt;

&lt;p&gt;Hybrid inference is one piece of a broader shift in what mobile AI can look like. The ability to run meaningful inference locally — even in a limited form — changes the kinds of apps you can design. Features that felt too latency-sensitive for a cloud round trip, or too privacy-sensitive to send off-device, become viable. The on-device model will improve over time. Device support will grow. The API will stabilize.&lt;/p&gt;

&lt;p&gt;The developers who understand this stack now, rough edges and all, will have a meaningful head start when it does.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to go from here
&lt;/h2&gt;

&lt;p&gt;MemoMind as we've built it is a solid foundation. The transcribe → summarize → save pipeline is generic enough to adapt to a wide range of use cases — meeting notes, workout logs, daily journals, field reports. The structured JSON prompt and the Firestore schema transfer cleanly to any domain.&lt;/p&gt;

&lt;p&gt;A few natural next steps if you want to keep building:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a memo list screen.&lt;/strong&gt; A simple Firestore real-time listener showing past memos with their on-device/cloud chip makes the app feel complete — and gives you a live view of how often on-device inference wins in practice across your test devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate on model download completion.&lt;/strong&gt; ML Kit exposes a download state API for Gemini Nano. Listening to it and showing a subtle "AI ready" indicator once it's cached is a small touch that makes a real UX difference for privacy-sensitive apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add the model device support check.&lt;/strong&gt; Before showing the on-device privacy promise to users, check at runtime whether their device actually supports Gemini Nano inference and surface that information appropriately.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;iOS support for hybrid inference is not yet available — watch the Firebase changelog. The Android experimental API suggests the Dart/Flutter SDK will follow once the underlying infrastructure is proven on native platforms first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The full source for MemoMind is on GitHub at &lt;strong&gt;&lt;a href="https://github.com/RealWorldApplications/memo-mind-post" rel="noopener noreferrer"&gt;https://github.com/RealWorldApplications/memo-mind-post&lt;/a&gt;&lt;/strong&gt;. If you run into anything unexpected with the hybrid routing on your device, or see the chip behave differently than expected, I'd like to hear about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://firebase.google.com/docs/ai-logic/hybrid/android/get-started?api=dev" rel="noopener noreferrer"&gt;Firebase AI Logic hybrid inference for Android (official docs)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://firebase.blog/posts/2026/04/cloud-next-2026-ai-logic" rel="noopener noreferrer"&gt;What's new from Firebase at Cloud Next '26&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://android-developers.googleblog.com/2026/04/Hybrid-inference-and-new-AI-models-are-coming-to-Android.html" rel="noopener noreferrer"&gt;Android Developers Blog: Experimental hybrid inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/android/ai-samples/tree/main/samples/gemini-hybrid" rel="noopener noreferrer"&gt;Official hybrid inference sample (android/ai-samples)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mvnrepository.com/artifact/com.google.firebase/firebase-ai-ondevice" rel="noopener noreferrer"&gt;firebase-ai-ondevice on Maven&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
      <category>android</category>
    </item>
    <item>
      <title>Weekend Challenge: Earth Day Edition</title>
      <dc:creator>Carol Bolger</dc:creator>
      <pubDate>Mon, 20 Apr 2026 00:59:00 +0000</pubDate>
      <link>https://forem.com/bolgercarol/weekend-challenge-earth-day-edition-2ogp</link>
      <guid>https://forem.com/bolgercarol/weekend-challenge-earth-day-edition-2ogp</guid>
      <description>&lt;p&gt;*This is a submission for Weekend Challenge: Earth Day Edition&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agentic Acres&lt;/strong&gt; is an intelligent, multimodal permaculture design assistant built to accelerate regenerative agriculture and make ecological landscaping accessible to everyone. &lt;/p&gt;

&lt;p&gt;My primary goal was to take the overwhelming complexity of designing sustainable ecosystems, knowing which plants support each other, fix nitrogen, or accumulate nutrients, and boil it down to a single photograph. Users simply snap a picture of their yard or planting site (or upload one from their gallery) and provide their location. &lt;/p&gt;

&lt;p&gt;The application instantly assesses the site and engineers a complete, climate-specific &lt;strong&gt;Plant Guild&lt;/strong&gt;. Instead of endless research, users are presented with a gorgeous, dynamic dashboard that outlines their primary canopy tree, nitrogen fixers, dynamic accumulators, and ground covers. All are precisely tailored to their local environment and the exact physical constraints shown in the photo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Try the web app:&lt;br&gt;
&lt;a href="https://agentic-acres.web.app" rel="noopener noreferrer"&gt;https://agentic-acres.web.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See a demo video on YouTube:&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/yMg1gkwFy6o"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/RealWorldApplications/agentic-acres" rel="noopener noreferrer"&gt;https://github.com/RealWorldApplications/agentic-acres&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;I built Agentic Acres as a fully responsive Web Application using &lt;strong&gt;Flutter&lt;/strong&gt;, and deployed it using &lt;strong&gt;Firebase Hosting&lt;/strong&gt;.  I built the entire app in Google Antigravity and used Gemini 3.1 as my AI assistant.&lt;/p&gt;

&lt;p&gt;The core intelligence of the application is driven by *&lt;em&gt;gemini-3.1-flash-lite-preview&lt;br&gt;
*&lt;/em&gt; using the &lt;code&gt;google_generative_ai&lt;/code&gt; SDK. I used Gemini's multimodal vision capabilities to achieve something unique: the app passes the user-uploaded image alongside their geographic coordinates directly into the model. I implemented strict prompt engineering so that Gemini would output a highly constrained JSON schema rather than conversational text. This allows the frontend to confidently parse the AI's ecological assessment and dynamically construct the localized "Bento Box" UI.&lt;/p&gt;

&lt;p&gt;To make the app feel alive and premium, I incorporated several other key technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Geolocation &amp;amp; Geocoding&lt;/strong&gt;: I integrated the &lt;code&gt;geolocator&lt;/code&gt; package to allow users to pull their exact GPS coordinates with a single click.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Dynamic Media Pipelines&lt;/strong&gt;: Because Gemini returns common plant names rather than hardcoded URLs, I integrated the &lt;strong&gt;Wikipedia (Wikimedia) Action API&lt;/strong&gt;. As the dashboard renders, the app asynchronously executes fuzzy searches against Wikipedia's backend, ripping the main 400px thumbnail of the matched plant and fading it into the glassmorphic UI cards in real-time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Modern Glassmorphism Design&lt;/strong&gt;: I styled the frontend using Flutter widget compositions to build a dark-mode, frosted-glass interface that feels modern, premium, and instantly trustworthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prize Categories
&lt;/h2&gt;

&lt;p&gt;Best Use of Google Gemini&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>weekendchallenge</category>
      <category>flutter</category>
      <category>antigravity</category>
    </item>
    <item>
      <title>Gemma 4: A Practical Guide to Running Frontier AI on Your Own Hardware</title>
      <dc:creator>Carol Bolger</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:44:37 +0000</pubDate>
      <link>https://forem.com/bolgercarol/gemma-4-a-practical-guide-to-running-frontier-ai-on-your-own-hardware-5h9l</link>
      <guid>https://forem.com/bolgercarol/gemma-4-a-practical-guide-to-running-frontier-ai-on-your-own-hardware-5h9l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbejrip2rozhto0uac1aj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbejrip2rozhto0uac1aj.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;"This article was originally published on my Substack."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There’s a quiet assumption baked into the way most of us use AI today: you type a prompt, it leaves your machine, travels to a data center somewhere, gets processed on hardware you don’t own, and the answer comes back. For most of the last three years, “using AI” has meant “renting AI.” Your data leaves. You hope for the best.&lt;/p&gt;

&lt;p&gt;Gemma 4 is Google DeepMind’s clearest challenge to that model yet. Recently released under an Apache 2.0 license, it’s a family of four open-weight models. They range from a 2-billion-parameter edge model that fits on a phone to a 31-billion-parameter dense model that runs on a single consumer GPU. These aren’t research toys. The 31B variant currently ranks as the #3 open model in the world on the Arena AI text leaderboard, outcompeting models twenty times its size. The 26B model sits at #6.&lt;/p&gt;

&lt;p&gt;Built on the same research and technology behind Gemini 3, these models handle multi-step reasoning, native function calling, code generation, and multimodal input across text, images, video, and audio. They support over 140 languages out of the box. And they do all of it on hardware you already own or could afford to.&lt;/p&gt;

&lt;p&gt;Let’s break down what that actually means in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Under the Hood
&lt;/h2&gt;

&lt;p&gt;Gemma 4 ships in four sizes, each designed for a different deployment scenario. Understanding the differences matters because picking the right model is the single most important decision you’ll make.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;31B Dense&lt;/strong&gt; model is the quality leader. Every one of its 31 billion parameters activates on every inference pass, which means maximum reasoning depth at the cost of higher compute. If you’re fine-tuning for a specialized task such as legal analysis, medical summarization, domain-specific code generation then this is your foundation. It fits on a single 80GB NVIDIA H100 in full bfloat16 precision, or on consumer GPUs in quantized form.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;26B Mixture-of-Experts (MoE)&lt;/strong&gt; model takes a different approach. It contains 26 billion total parameters but only activates roughly 3.8 billion of them during any given inference pass. Think of it as a team of specialists: instead of running every expert on every query, the model routes each token to the most relevant subset. This model delivers significantly accelerated token generation, matching the performance of much smaller architectures while preserving the core reasoning strengths of the dense version. It is the ideal choice when prioritizing low latency over absolute benchmark perfection.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;E4B and E2B&lt;/strong&gt; edge models are purpose-built for phones, IoT devices, and anything where RAM and battery life are constraints. These are multimodal out of the box. They handle text, images, video, and native audio input. They’re designed to run completely offline with near-zero latency on devices like the Raspberry Pi, NVIDIA Jetson Orin Nano, and Android phones. For Android developers specifically, these models are compatible with the AICore Developer Preview for forward compatibility with Gemini Nano 4.&lt;/p&gt;

&lt;p&gt;All four models support context windows of 128K tokens (edge) to 256K tokens (26B and 31B). To put that in practical terms: 256K tokens is roughly 500 pages of text. That’s an entire codebase, a full legal contract, or a quarter’s worth of financial filings processed in a single prompt with no chunking required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why “Local” Is a Business Decision, Not Just a Technical One
&lt;/h2&gt;

&lt;p&gt;If you’re a business owner reading this and wondering why you should care about where a model runs, the answer comes down to three things: data control, cost structure, and global reach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data stays on your hardware.&lt;/strong&gt; Every time your team sends customer data, internal documents, or proprietary information to a cloud AI provider, you’re trusting a third party with that data. For industries governed by HIPAA, GDPR, SOC 2, or similar regulations, that trust comes with compliance overhead and significant risk. Running Gemma 4 locally means sensitive information never leaves your premises. There’s no API call to audit, no third-party data processing agreement to negotiate, no residual data sitting on someone else’s servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cost model flips.&lt;/strong&gt; Cloud AI pricing is usage-based: you pay per token, per request, per minute. For predictable, high-volume operations, this makes local deployment financially advantageous within months, whereas the increasing costs of cloud scaling can otherwise become problematic. A customer service bot handling thousands of queries a day, a document analysis pipeline processing hundreds of contracts a week can get expensive fast. A local deployment has a fixed hardware cost and near-zero marginal cost per inference. Once the GPU is paid for, every additional query is essentially free. For high-volume, predictable workloads, the math favors local deployment within months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;140+ languages, no translation API required.&lt;/strong&gt; Gemma 4 is natively trained on over 140 languages. If you’re serving customers in São Paulo, Jakarta, and Berlin, you don’t need to add a translation layer or maintain separate model deployments. A single model handles multilingual input and output natively, which dramatically simplifies localization for global products.&lt;/p&gt;

&lt;h2&gt;
  
  
  For Developers: Agents, Not Chatbots
&lt;/h2&gt;

&lt;p&gt;The most significant shift in Gemma 4 isn’t a benchmark number; it’s the native support for agentic workflows. This isn’t a model that just answers questions. It’s a model designed to use tools, call functions, produce structured JSON output, and follow multi-step plans.&lt;/p&gt;

&lt;p&gt;In practical terms, that means you can build an agent that reads a GitHub issue, checks out the relevant branch, identifies the bug in context (thanks to the 256K window), writes a fix, runs the test suite, and opens a pull request. This is all orchestrated by the model’s own reasoning, with each step involving a structured function call to an external tool. Google has built this capability in natively, not as a wrapper or a prompt hack.&lt;/p&gt;

&lt;p&gt;For local development specifically, the 31B model is positioned as an offline coding assistant. Quantized versions run on consumer GPUs such as an RTX 4090 or RTX 5090, turning your workstation into a self-contained AI development environment with no internet dependency. Google and NVIDIA have collaborated on optimizations so that these models leverage Tensor Cores for accelerated inference out of the box, with day-one support from tools like Ollama, llama.cpp, vLLM, and LM Studio.&lt;/p&gt;

&lt;p&gt;The hardware partnership story extends beyond NVIDIA. Google has worked with Qualcomm Technologies and MediaTek on mobile optimization, and with Arm on efficient edge deployment. The goal is to be able to run Gemma 4 anywhere you have compute.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: The Simplest Path
&lt;/h2&gt;

&lt;p&gt;There are several ways to get Gemma 4 running. Here’s the fastest.&lt;/p&gt;

&lt;p&gt;If you want to test before committing, open &lt;strong&gt;Google AI Studio&lt;/strong&gt;. The 31B and 26B MoE models are available there for immediate experimentation. No need to download, set up, or have a GPU. For the edge models, Google’s AI Edge Gallery app on Android lets you test E4B and E2B directly on your phone.&lt;/p&gt;

&lt;p&gt;If you want to run locally, the most straightforward path is Ollama. Install it, then pull the model:&lt;/p&gt;

&lt;p&gt;`bash&lt;/p&gt;

&lt;p&gt;ollama pull gemma4`&lt;/p&gt;

&lt;p&gt;That’s it. You’re running a frontier-class model locally. If you want more control, such as quantization options, specific model variants, and GPU configuration, then download GGUF weights from Hugging Face and run them through llama.cpp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you’re building for production&lt;/strong&gt;, the model weights are available on Hugging Face, Kaggle, and Ollama, with day-one integration support from Hugging Face Transformers, vLLM, SGLang, NVIDIA NIM, and a long list of other frameworks. For cloud-scale deployment, Vertex AI, Cloud Run, and GKE are all supported paths on Google Cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For production-level builds&lt;/strong&gt;, you can access model weights through Kaggle, Ollama, and Hugging Face. These platforms offer immediate integration with various frameworks, including SGLang, vLLM, NVIDIA NIM, and Hugging Face Transformers. If your project requires cloud-scale deployment, Google Cloud provides supported pathways through GKE, Cloud Run, and Vertex AI.&lt;/p&gt;

&lt;p&gt;The Apache 2.0 license means there are no usage restrictions, no reporting requirements, and no commercial limitations. You can fine-tune, redistribute, and deploy without asking permission.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Use Cases Worth Building
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The offline retail assistant&lt;/strong&gt;. Picture a phone app that uses the E4B model to “see” products through the camera, answer customer questions about specifications, check local inventory, and suggest alternatives, all without an internet connection. In a warehouse, a retail floor, or a remote pop-up shop, this works where cloud-dependent solutions don’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The enterprise document agent&lt;/strong&gt;. A 256K context window means you can feed an entire quarterly report, or several, into a single prompt. Pair that with native function calling, and you have an agent that reads the filing, extracts key metrics, compares them against last quarter’s numbers (pulled via a structured API call), flags anomalies, and drafts a summary. The entire pipeline runs on-premises, with no customer or financial data leaving your network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The autonomous code reviewer&lt;/strong&gt;. Point the 31B model at a pull request. It reads the diff in context of the full repository (256K tokens covers a lot of code), identifies potential bugs, checks for style violations, suggests performance improvements, and posts its review all as a local CI step that adds seconds, not minutes, to your pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Gemma 4 Still Falls Short
&lt;/h2&gt;

&lt;p&gt;No model is perfect at everything, and intellectual honesty about limitations builds more trust than uncritical hype.&lt;/p&gt;

&lt;p&gt;Gemma 4's biggest model is 31 billion parameters. It's great, but for really heavy lifting, such as super complex math, high-level science, or very nuanced writing, you will want to use a cloud model. The faster MoE version is a bit of a trade-off: you get speed, but lose a tiny bit of quality. And while the mobile models are impressive, a small 2B model on your phone won't replace a massive server for critical tasks.&lt;/p&gt;

&lt;p&gt;Local deployment also shifts operational responsibility to you. There’s no managed service handling uptime, scaling, or security patches. If you’re running Gemma 4 in production, you own the infrastructure. This is the point for data sovereignty, but it does mean your team needs the capacity to manage it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Bigger Picture&lt;/strong&gt;&lt;br&gt;
The Gemma 4 release represents something worth paying attention to beyond the model itself. Google is releasing frontier-competitive models under Apache 2.0 at the same moment some other AI labs are pulling back from open releases. That’s a strategic bet on ecosystem growth over model lock-in, and it matters for anyone building products on top of open AI infrastructure.&lt;/p&gt;

&lt;p&gt;We’re watching a shift from “AI as a service you rent” to “AI as infrastructure you own.” While Gemma 4 won't entirely eliminate your reliance on cloud services, it represents a significant step toward that goal.&lt;/p&gt;

&lt;p&gt;A model that ranks among the top open models in the world, runs on a consumer GPU, handles multimodal input in 140 languages, and ships under a permissive open-source license is a genuinely new thing in this space.&lt;/p&gt;

&lt;p&gt;The question worth sitting with: if you can run this level of intelligence on your own hardware, with your data never leaving your control, what does that make possible that wasn’t before?&lt;/p&gt;

&lt;p&gt;*Gemma 4 models are available:&lt;br&gt;
&lt;a href="https://huggingface.co/collections/google/gemma-4" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;, &lt;a href="https://www.kaggle.com/models/google/gemma-4" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;, and &lt;a href="https://ollama.com/library/gemma4" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Try them in &lt;a href="https://aistudio.google.com/prompts/new_chat?model=gemma-4-31b-it" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt; or explore the edge models in the &lt;a href="https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery" rel="noopener noreferrer"&gt;AI Edge Gallery&lt;/a&gt; on Android.&lt;br&gt;
Thanks for reading!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>softwaredevelopment</category>
      <category>productivity</category>
    </item>
    <item>
      <title>5 Ways to Optimize Your AI Workflows ⚡️</title>
      <dc:creator>Carol Bolger</dc:creator>
      <pubDate>Mon, 06 Apr 2026 14:00:14 +0000</pubDate>
      <link>https://forem.com/bolgercarol/5-ways-to-optimize-your-ai-workflows-3ll3</link>
      <guid>https://forem.com/bolgercarol/5-ways-to-optimize-your-ai-workflows-3ll3</guid>
      <description>&lt;div&gt;
    &lt;iframe src="https://www.youtube.com/embed/7QUmE7YgnLA"&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;


</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>buildinpublic</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How do you actually manage your content's SEO performance?</title>
      <dc:creator>Carol Bolger</dc:creator>
      <pubDate>Sat, 04 Apr 2026 22:30:14 +0000</pubDate>
      <link>https://forem.com/bolgercarol/how-do-you-actually-manage-your-contents-seo-performance-4jn0</link>
      <guid>https://forem.com/bolgercarol/how-do-you-actually-manage-your-contents-seo-performance-4jn0</guid>
      <description>&lt;p&gt;I write about tech and I've been frustrated with piecing together Google Search Console, spreadsheets, and random tools to figure out what's working. Curious how others handle this — do you have a system that actually works?&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>discuss</category>
      <category>marketing</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
