<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rakesh Mondal</title>
    <description>The latest articles on Forem by Rakesh Mondal (@rakesh_cse_2004).</description>
    <link>https://forem.com/rakesh_cse_2004</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920369%2Fcca4620d-4c8e-4a09-8753-4e755ab54b3e.png</url>
      <title>Forem: Rakesh Mondal</title>
      <link>https://forem.com/rakesh_cse_2004</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rakesh_cse_2004"/>
    <language>en</language>
    <item>
      <title>The Night Gemma 4 Changed How I Write Code</title>
      <dc:creator>Rakesh Mondal</dc:creator>
      <pubDate>Sat, 09 May 2026 11:45:09 +0000</pubDate>
      <link>https://forem.com/rakesh_cse_2004/the-night-gemma-4-changed-how-i-write-code-143o</link>
      <guid>https://forem.com/rakesh_cse_2004/the-night-gemma-4-changed-how-i-write-code-143o</guid>
      <description>&lt;h2&gt;
  
  
  The night I stopped trusting the cloud
&lt;/h2&gt;

&lt;p&gt;It was past midnight. I had a bug I could not explain, an API bill I&lt;br&gt;
could not justify, and a growing discomfort I could not name.&lt;/p&gt;

&lt;p&gt;Every prompt I sent to a cloud AI contained something real — a piece&lt;br&gt;
of my actual project, my actual logic, my actual mistakes. And every&lt;br&gt;
time I hit send, those thoughts traveled somewhere I did not control.&lt;/p&gt;

&lt;p&gt;That night I pulled &lt;strong&gt;Gemma 4&lt;/strong&gt; locally for the first time.&lt;/p&gt;

&lt;p&gt;I pointed it at a 200-line Python module and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What is wrong with how I have structured this?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It did not compliment me. It called my error handling &lt;em&gt;optimistic to&lt;br&gt;
the point of being dangerous&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I checked. It was right. And nothing I typed ever left my machine.&lt;/p&gt;

&lt;p&gt;That is when I understood what Gemma 4 actually is — not just a&lt;br&gt;
smaller version of a cloud model, but a fundamentally different&lt;br&gt;
relationship between a developer and their AI.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Gemma 4 actually is (the version nobody explains clearly)
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is Google's open-weight model family released in 2025. The&lt;br&gt;
weights are yours. You download them, run them, fine-tune them, and&lt;br&gt;
ship them inside your own products without a single token touching a&lt;br&gt;
third-party server.&lt;/p&gt;

&lt;p&gt;But this generation is different from previous open models in three&lt;br&gt;
important ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native multimodal input.&lt;/strong&gt; Gemma 4 models can process text and&lt;br&gt;
images together in the same prompt — out of the box, no plugin&lt;br&gt;
required. Feed it a screenshot of a UI bug and ask what is wrong.&lt;br&gt;
Hand it a diagram and ask it to generate the corresponding code.&lt;br&gt;
This changes what "local AI" means for real-world developer tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;128K context window.&lt;/strong&gt; This is not a marketing number. A 128K&lt;br&gt;
context window means you can feed Gemma 4 your entire codebase — not&lt;br&gt;
just one file, not just one function. You can ask questions that span&lt;br&gt;
modules, trace logic across hundreds of lines, and get answers that&lt;br&gt;
understand the whole picture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A range that runs everywhere.&lt;/strong&gt; From a Raspberry Pi to a large-scale&lt;br&gt;
server deployment, there is a Gemma 4 model for the hardware you&lt;br&gt;
actually have.&lt;/p&gt;


&lt;h2&gt;
  
  
  The three variants — and which one is yours
&lt;/h2&gt;

&lt;p&gt;This is the question every article avoids answering directly. I will&lt;br&gt;
not do that.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Runs on&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-4-it-2b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2 billion&lt;/td&gt;
&lt;td&gt;Raspberry Pi, phones, edge devices&lt;/td&gt;
&lt;td&gt;Embedded apps, offline tools, ultra-low latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-4-it-9b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;9 billion&lt;/td&gt;
&lt;td&gt;Laptop with 16GB RAM, mid-range GPU&lt;/td&gt;
&lt;td&gt;Most developer tasks — &lt;strong&gt;start here&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-4-it-27b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;27 billion&lt;/td&gt;
&lt;td&gt;Workstation GPU (RTX 4090, A100)&lt;/td&gt;
&lt;td&gt;Complex reasoning, long context tasks, production use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;it&lt;/code&gt; suffix means instruction-tuned — already aligned for&lt;br&gt;
conversation and instruction following. The &lt;code&gt;pt&lt;/code&gt; suffix means&lt;br&gt;
pre-trained base, used for fine-tuning on your own domain.&lt;/p&gt;

&lt;p&gt;My honest recommendation: &lt;strong&gt;run 9B first&lt;/strong&gt;. It is fast enough for&lt;br&gt;
real-time use, smart enough to reason properly, and fits on hardware&lt;br&gt;
most developers already own. If it surprises you, you are done. If it&lt;br&gt;
disappoints you for your specific use case, scale up to 27B.&lt;/p&gt;


&lt;h2&gt;
  
  
  Setting up locally — what actually works
&lt;/h2&gt;

&lt;p&gt;I am not giving you a Colab notebook. I am telling you what I ran on&lt;br&gt;
my own machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; Python 3.10+, 16GB RAM minimum, ~20GB disk space&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama — the cleanest local inference runtime available&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull Gemma 4 9B (adjust for your variant)&lt;/span&gt;
ollama pull gemma4:9b

&lt;span class="c"&gt;# Run it immediately&lt;/span&gt;
ollama run gemma4:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three commands. No API key. No billing alert at the end of the month.&lt;/p&gt;

&lt;p&gt;To call it from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:9b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Review this function and be honest about what is wrong with it.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multimodal input — passing an image alongside text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;screenshot.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemma4:9b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;This is a screenshot of a UI bug. What is causing it and how do I fix it?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is real multimodal inference running entirely on your machine.&lt;br&gt;
No cloud. No cost per token. No data leaving your environment.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I actually tested — not benchmarks, real tasks
&lt;/h2&gt;

&lt;p&gt;Benchmarks tell you how a model scores on exams. Here is how it&lt;br&gt;
performs on the things developers actually do.&lt;/p&gt;
&lt;h3&gt;
  
  
  Codebase-level reasoning with 128K context
&lt;/h3&gt;

&lt;p&gt;I fed Gemma 4 27B an entire Node.js project — 43 files, roughly&lt;br&gt;
18,000 lines — and asked:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Which module has the most hidden coupling to other modules and&lt;br&gt;
why is that a problem?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It identified a utility file that was quietly imported by 11 other&lt;br&gt;
modules, explained why this created a hidden dependency graph that&lt;br&gt;
would make future refactoring painful, and suggested a specific&lt;br&gt;
restructuring approach.&lt;/p&gt;

&lt;p&gt;No cloud model I prompted with a single file ever gave me that answer,&lt;br&gt;
because no cloud prompt ever had the full picture.&lt;/p&gt;
&lt;h3&gt;
  
  
  Multimodal bug diagnosis
&lt;/h3&gt;

&lt;p&gt;I took a screenshot of a broken CSS layout — a nav bar collapsing&lt;br&gt;
on mobile — and passed it directly to Gemma 4 9B with the question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What is wrong with this layout and what CSS would fix it?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It identified the missing &lt;code&gt;flex-wrap&lt;/code&gt; and the hardcoded pixel width&lt;br&gt;
on the nav items without seeing a single line of my code. It saw the&lt;br&gt;
rendered output and reasoned backwards to the cause.&lt;/p&gt;

&lt;p&gt;That is a workflow shift. Describing a visual bug in words is&lt;br&gt;
imprecise. Showing it directly is not.&lt;/p&gt;
&lt;h3&gt;
  
  
  Fine-tuning for domain knowledge
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pt&lt;/code&gt; (pre-trained) variants are designed for specialisation. Using&lt;br&gt;
QLoRA — Quantized Low-Rank Adaptation — you can teach Gemma 4 your&lt;br&gt;
codebase's patterns, your documentation's tone, your domain's&lt;br&gt;
vocabulary, without retraining the entire model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;

&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-4-9b-pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# trainable params: ~42M — roughly 0.5% of total model weights
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You are not retraining from scratch. You are teaching it a dialect.&lt;br&gt;
That is efficient, practical, and something any developer with a decent&lt;br&gt;
GPU can do today.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it genuinely struggles — the honest section
&lt;/h2&gt;

&lt;p&gt;I would be wasting your time if I only praised it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 27B model demands real hardware.&lt;/strong&gt; On a machine without a capable&lt;br&gt;
dedicated GPU, inference is slow enough to break the conversational&lt;br&gt;
flow that makes local AI useful. If you only have CPU, start with 2B.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context quality degrades at the edges.&lt;/strong&gt; The 128K window is real, but&lt;br&gt;
coherence towards the far end of a very long context is not equal to&lt;br&gt;
coherence in the middle. For massive codebases, chunking strategically&lt;br&gt;
still produces better results than naive full-context dumps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-step mathematical reasoning has limits.&lt;/strong&gt; Complex proof chains&lt;br&gt;
or deeply nested logical puzzles — the 9B model makes confident errors.&lt;br&gt;
The 27B is significantly better but there is still a gap versus&lt;br&gt;
frontier closed models on the hardest reasoning tasks.&lt;/p&gt;

&lt;p&gt;Knowing these limits is not a criticism. It is how you use the right&lt;br&gt;
tool correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing the spec sheet will not tell you
&lt;/h2&gt;

&lt;p&gt;I want to say something that gets lost in every model comparison.&lt;/p&gt;

&lt;p&gt;When I ran Gemma 4 locally, I shared my actual database schema with&lt;br&gt;
it. My actual API architecture. Conversations about real design&lt;br&gt;
decisions in a real product with real users.&lt;/p&gt;

&lt;p&gt;With cloud AI, every one of those prompts travels somewhere. Gets&lt;br&gt;
logged somewhere. Possibly influences something somewhere downstream.&lt;/p&gt;

&lt;p&gt;With Gemma 4, that conversation stays on the machine it runs on.&lt;/p&gt;

&lt;p&gt;For independent developers. For students building things that matter&lt;br&gt;
to them. For engineers at companies with data governance requirements.&lt;br&gt;
For anyone working on something they are not ready to share with the&lt;br&gt;
world yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Owning your inference is not a secondary feature. For a large class&lt;br&gt;
of real use cases, it is the only feature that matters.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The decision that took me ten minutes to make
&lt;/h2&gt;

&lt;p&gt;Do you need edge / mobile deployment?&lt;br&gt;
└─ Yes → gemma-4-it-2b&lt;br&gt;
Do you have a standard developer laptop (16GB RAM)?&lt;br&gt;
└─ Yes → gemma-4-it-9b  ← most people should start here&lt;br&gt;
Do you have a workstation GPU?&lt;br&gt;
└─ Yes → gemma-4-it-27b&lt;br&gt;
Do you need the model to know your specific domain deeply?&lt;br&gt;
└─ Yes → gemma-4-pt-[size] + QLoRA fine-tuning&lt;/p&gt;

&lt;p&gt;Do not spend three days on this decision. Pull 9B tonight. You will&lt;br&gt;
know within one conversation whether it fits your use case.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it means that open models are this capable now
&lt;/h2&gt;

&lt;p&gt;I have been writing software long enough to remember when "run AI&lt;br&gt;
locally" was a novelty with no practical use.&lt;/p&gt;

&lt;p&gt;Gemma 4 is not that.&lt;/p&gt;

&lt;p&gt;It is a model that a solo developer — no enterprise contract, no&lt;br&gt;
research budget, no special access — can run, query with images,&lt;br&gt;
reason over an entire codebase, fine-tune on private data, and deploy&lt;br&gt;
inside a product. Completely independently. Right now.&lt;/p&gt;

&lt;p&gt;The frontier closed models will always be ahead on the hardest&lt;br&gt;
benchmarks. That argument is not interesting anymore. The interesting&lt;br&gt;
question is what happens when capable, private, fast AI inference&lt;br&gt;
becomes something any developer can own.&lt;/p&gt;

&lt;p&gt;The answer is being written by people doing exactly what you are doing&lt;br&gt;
right now — reading about it, pulling a model, building something.&lt;/p&gt;

&lt;p&gt;Gemma 4 is not trying to beat the biggest closed model. It is trying&lt;br&gt;
to be the model that a million developers actually use, understand,&lt;br&gt;
and make their own.&lt;/p&gt;

&lt;p&gt;Looking at where it is today, I think it is already winning that race.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start tonight
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:9b
ollama run gemma4:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
