<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Onah Sunday.</title>
    <description>The latest articles on Forem by Onah Sunday. (@sundayonah).</description>
    <link>https://forem.com/sundayonah</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F910464%2Fdbdb225b-898d-41f3-9fa1-14f355e80ee2.jpeg</url>
      <title>Forem: Onah Sunday.</title>
      <link>https://forem.com/sundayonah</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sundayonah"/>
    <language>en</language>
    <item>
      <title>Google Antigravity 2.0 Is the I/O 2026 Announcement You Should Actually Care About</title>
      <dc:creator>Onah Sunday.</dc:creator>
      <pubDate>Fri, 22 May 2026 14:16:13 +0000</pubDate>
      <link>https://forem.com/sundayonah/google-antigravity-20-is-the-io-2026-announcement-you-should-actually-care-about-3hc1</link>
      <guid>https://forem.com/sundayonah/google-antigravity-20-is-the-io-2026-announcement-you-should-actually-care-about-3hc1</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Everyone is talking about Gemini 3.5 Flash being four times faster than frontier models. Fine. That's real and important.&lt;/p&gt;

&lt;p&gt;But if you're a developer who actually ships things — not just watches keynotes — the most structurally significant thing Google announced at I/O 2026 wasn't a model. It wasn't a pricing tier. It wasn't even the intelligent eyewear.&lt;/p&gt;

&lt;p&gt;It was &lt;strong&gt;Antigravity 2.0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And most people are going to miss why it matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  First, a quick rewind
&lt;/h2&gt;

&lt;p&gt;Google launched the original Antigravity in November 2025 alongside Gemini 3. At the time, it was positioned as an AI-native IDE — a Cursor competitor. A heavily modified fork of VS Code, with an agent sidebar, tab completions, and inline commands. An editor view for hands-on coding, plus a Manager surface for spawning and observing multiple agents working asynchronously.&lt;/p&gt;

&lt;p&gt;It was interesting. It wasn't a paradigm shift.&lt;/p&gt;

&lt;p&gt;Version 2.0 is different. At I/O 2026, Google moved its developer tooling away from IDE-centric assistance and toward multi-agent workflow management as the primary abstraction.&lt;/p&gt;

&lt;p&gt;The IDE is now the least interesting part of Antigravity. What they actually shipped is a platform.&lt;/p&gt;




&lt;h2&gt;
  
  
  What they actually shipped
&lt;/h2&gt;

&lt;p&gt;Here's everything that landed at I/O, in one place:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Antigravity 2.0 desktop app&lt;/strong&gt; — A new standalone desktop application that enables a full "agent-optimized" user experience. It introduces dynamic subagents for parallelized workflows, scheduled tasks for background automation, and new integrations with Google AI Studio, Android, and Firebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Antigravity CLI&lt;/strong&gt; — Built in Go, snappier and more responsive than Gemini CLI. It fully replaces Gemini CLI, preserving the most critical features: Agent Skills, Hooks, Subagents, and Extensions — now rebranded as Antigravity plugins. Crucially, it shares the same agent harness as the desktop app, meaning all future improvements apply across both surfaces automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Antigravity SDK&lt;/strong&gt; — Provides programmatic access to the same agent harness that powers Google's own products. Developers can define custom agent behaviors and host them on their own infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Agents in the Gemini API&lt;/strong&gt; — With a single API call, developers can spin up an agent that reasons, uses tools, and executes code in an isolated Linux environment. Three capabilities define this: the agent harness itself, persistent isolated environments where each interaction creates a resumable environment with files and state intact, and custom agent definitions using Markdown skill files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise layer&lt;/strong&gt; — Google Cloud customers can now use Antigravity 2.0 and Antigravity CLI with their Gemini Enterprise Agent Platform project — all agent inference runs via Agent Platform models within a secure cloud boundary, inheriting Google Cloud's data privacy protections.&lt;/p&gt;

&lt;p&gt;That's five distinct product surfaces, all sharing the same underlying agent harness. That's not a product update. That's a platform launch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that actually changes how you build
&lt;/h2&gt;

&lt;p&gt;The feature I keep coming back to is &lt;strong&gt;Managed Agents&lt;/strong&gt;. Here's why.&lt;/p&gt;

&lt;p&gt;Right now, if you want to add an AI agent to your product — something that reasons, uses tools, and executes code — you have to stand up all the infrastructure yourself. Memory management, tool invocation, isolated execution environments, state persistence across turns. It's a lot of plumbing before you write a line of actual product code.&lt;/p&gt;

&lt;p&gt;With Managed Agents, you can spin up an agent that reasons, uses tools, and executes code in an isolated Linux environment with a single API call.&lt;/p&gt;

&lt;p&gt;And critically: each interaction creates an environment that can be resumed in follow-up calls with all files and state intact, enabling seamless multi-turn sessions without reinitializing context.&lt;/p&gt;

&lt;p&gt;That's persistent, stateful, isolated agent execution — as a managed service. The infrastructure problem is Google's. You just call the API.&lt;/p&gt;

&lt;p&gt;If that sounds familiar, it's because it's the same bet Anthropic is making with Claude's computer use and tool use APIs, and the same direction Hermes Agent (Nous Research's open-source agent) took with its persistent gateway model. The difference is Google can embed this directly into the development tools you already use every day.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gemini CLI deprecation is the real signal
&lt;/h2&gt;

&lt;p&gt;Gemini CLI will no longer accept new installations for GitHub organizations after June 18, 2026.&lt;/p&gt;

&lt;p&gt;This is worth pausing on. Gemini CLI had a massive developer user base. Sunsetting it — even with a migration path — is not a casual decision. It signals that Google is serious about Antigravity as a platform, not a side experiment.&lt;/p&gt;

&lt;p&gt;Google is now unifying its developer-facing coding strategy into a single harness across multiple surfaces — editor, terminal, SDK, managed cloud execution. One agent runtime. Multiple entry points. Skills you build once are portable across all of them.&lt;/p&gt;

&lt;p&gt;That portability is the core of the bet. Write a Markdown skill file once. Use it in the desktop app, the CLI, the SDK, and Managed Agents in the API. That's how you build a moat around a developer platform — not by having the best model, but by making your ecosystem the easiest place to build agents that work everywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it falls short (for now)
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the gaps, because the keynote glossed over them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise governance is still coming.&lt;/strong&gt; Full integration with A2A and Agent Platform governance and security are coming soon — which means enterprise teams at large companies probably can't use this in production yet. Competitors like JetBrains AI Enterprise already ship SOC 2 Type II certification and on-premises deployment options. That gap affects whether security, compliance, and IT teams can approve Antigravity at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The CLI migration isn't frictionless.&lt;/strong&gt; There won't be 1:1 feature parity right out of the gate. If your team has built workflows and scripts on top of Gemini CLI, you'll need to audit what breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency at scale is unproven.&lt;/strong&gt; The demos showed fast individual agent runs. Multi-agent parallelization at production scale, with real data and real error rates, is a different story. We don't have benchmarks for that yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to try this week
&lt;/h2&gt;

&lt;p&gt;If you want to actually kick the tires, here's where to start:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Download Antigravity 2.0 desktop app&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Antigravity 2.0 is a GUI app — download it from the official page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://antigravity.google/download
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Available free for macOS, Windows, and Linux. Sign in with your Google account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Install the Antigravity CLI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The CLI is separate from the desktop app and works in your terminal. These are the real install commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux / WSL&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://antigravity.google/cli/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows PowerShell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://antigravity.google/cli/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note if you're on Windows:&lt;/strong&gt; I tried the WSL route first and hit a 404 on the wrong URL. The correct path is &lt;code&gt;/cli/install.sh&lt;/code&gt; — not &lt;code&gt;/install.sh&lt;/code&gt;. Make sure you use the full URL above.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Migrate from Gemini CLI once installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;antigravity migrate &lt;span class="nt"&gt;--from&lt;/span&gt; gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Write a custom skill file&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Skills are just Markdown — same concept as Hermes Agent's &lt;code&gt;SKILL.md&lt;/code&gt; and Claude Code's custom instructions. The syntax is intentionally familiar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Skill: pr-reviewer&lt;/span&gt;

&lt;span class="gu"&gt;## Purpose&lt;/span&gt;
Review pull requests for security issues, performance regressions,
and style violations.

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Flag any hardcoded credentials immediately
&lt;span class="p"&gt;-&lt;/span&gt; Check for N+1 query patterns in database calls
&lt;span class="p"&gt;-&lt;/span&gt; Verify error handling on all async operations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop it in &lt;code&gt;~/.gemini/antigravity/skills/&lt;/code&gt; and it's available across all Antigravity surfaces — desktop, CLI, and Managed Agents in the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Spin up a Managed Agent via the Gemini API&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://generativelanguage.googleapis.com/v1beta/agents&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a code review agent.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code_execution&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file_system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;persistent_environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// agent.session_id can be resumed across API calls with full file state intact&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;Here's the framing I keep returning to: the AI coding tools that win in 2026 won't be the ones with the best autocomplete. They'll be the ones that make agent orchestration the natural next step after writing a function.&lt;/p&gt;

&lt;p&gt;Cursor made AI assistance feel native to the editor. Antigravity 2.0 is trying to make agent orchestration feel native to the entire development workflow — editor, terminal, CI/CD, cloud.&lt;/p&gt;

&lt;p&gt;Whether Google pulls it off depends on execution. The platform is coherent. The skill portability story is real. The Managed Agents API removes genuine infrastructure friction. But the governance gaps mean enterprises are watching, not deploying. And the Gemini CLI migration will cause friction before it resolves.&lt;/p&gt;

&lt;p&gt;The announcement I'd watch most closely isn't the model benchmark or the pricing tier. It's whether Antigravity skill files become as ubiquitous as &lt;code&gt;.gitignore&lt;/code&gt; or &lt;code&gt;Dockerfile&lt;/code&gt; — shared, versioned, and composable. If that happens, Google wins the agent tooling layer. If it doesn't, Antigravity is a very good IDE.&lt;/p&gt;

&lt;p&gt;The keynote made the former seem possible. Only the next six months will tell us which it actually is.&lt;/p&gt;




</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>DevBrief — AI Standup Writer Powered by Hermes Agent (Vercel + Render)</title>
      <dc:creator>Onah Sunday.</dc:creator>
      <pubDate>Wed, 20 May 2026 04:27:47 +0000</pubDate>
      <link>https://forem.com/sundayonah/devbrief-ai-standup-writer-powered-by-hermes-agent-vercel-render-2lgh</link>
      <guid>https://forem.com/sundayonah/devbrief-ai-standup-writer-powered-by-hermes-agent-vercel-render-2lgh</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://devbrief-tau.vercel.app" rel="noopener noreferrer"&gt;https://devbrief-tau.vercel.app&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/sundayonah/devbrief" rel="noopener noreferrer"&gt;github.com/sundayonah/devbrief&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Hermes health:&lt;/strong&gt; &lt;a href="https://devbrief-hermes.onrender.com/health" rel="noopener noreferrer"&gt;devbrief-hermes.onrender.com/health&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Setup guide:&lt;/strong&gt; &lt;a href="https://dev.to/sundayonah/how-i-connected-hermes-agent-to-my-nextjs-app-and-why-its-not-just-another-chatbot-wrapper-3p59"&gt;How I connected Hermes to Next.js&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DevBrief&lt;/strong&gt; turns GitHub activity into human-readable &lt;strong&gt;standups&lt;/strong&gt;, &lt;strong&gt;PR changelogs&lt;/strong&gt;, or &lt;strong&gt;work logs&lt;/strong&gt;. Any visitor can sign in with &lt;strong&gt;GitHub OAuth&lt;/strong&gt;, pick a repo, set a time range and branch, filter PRs and authors, choose a tone (casual / formal / concise), and hit &lt;strong&gt;Generate&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Next.js app does &lt;strong&gt;not&lt;/strong&gt; call OpenRouter directly. It fetches commits, PRs, and issues from GitHub, then calls &lt;strong&gt;Hermes Agent’s OpenAI-compatible API&lt;/strong&gt; (&lt;code&gt;POST /v1/chat/completions&lt;/code&gt;) on a long-running gateway. Hermes runs the agent loop (skills, tools, server-side model config) and returns the final brief.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users → DevBrief (Vercel)
            ↓  POST /api/summary → lib/hermes.ts
      Hermes gateway (Docker on Render)
            ↓  model: openrouter/owl-alpha
      OpenRouter (API key only on Hermes — not on Vercel)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzk71pdpsgd83xu6zmctz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzk71pdpsgd83xu6zmctz.png" alt="DevBrief deployed UI — sign in, repo picker, filters, and output modes" width="648" height="1285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;strong&gt;&lt;a href="https://devbrief-tau.vercel.app" rel="noopener noreferrer"&gt;devbrief-tau.vercel.app&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect GitHub&lt;/strong&gt; → select a repo → choose output mode and tone → &lt;strong&gt;Generate&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Confirm Hermes is up: &lt;a href="https://devbrief-hermes.onrender.com/health" rel="noopener noreferrer"&gt;https://devbrief-hermes.onrender.com/health&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/sundayonah/devbrief" rel="noopener noreferrer"&gt;github.com/sundayonah/devbrief&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Piece&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hermes client&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;lib/hermes.ts&lt;/code&gt; → &lt;code&gt;HERMES_ENDPOINT&lt;/code&gt; + &lt;code&gt;POST /v1/chat/completions&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summary API&lt;/td&gt;
&lt;td&gt;&lt;code&gt;app/api/summary/route.ts&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standup skill&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hermes-skills/standup-writer.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Docker image&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker/hermes/Dockerfile&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production model config&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;docker/hermes/config.yaml&lt;/code&gt; (&lt;code&gt;openrouter/owl-alpha&lt;/code&gt;, &lt;code&gt;max_tokens: 2048&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy guide&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docs/DEPLOYMENT.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js 14 (App Router), TypeScript, Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth &amp;amp; GitHub&lt;/td&gt;
&lt;td&gt;NextAuth.js, Octokit, GitHub OAuth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agent&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; — &lt;code&gt;hermes gateway run&lt;/code&gt; + API server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;openrouter/owl-alpha&lt;/code&gt; via OpenRouter (on Hermes host only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;App hosting&lt;/td&gt;
&lt;td&gt;Vercel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent hosting&lt;/td&gt;
&lt;td&gt;Docker on Render&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Hermes is not a chatbot wrapper here — the gateway is the brain for every generation.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hermes capability&lt;/th&gt;
&lt;th&gt;How DevBrief uses it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API server&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;API_SERVER_ENABLED=true&lt;/code&gt;; Next.js calls &lt;code&gt;/v1/chat/completions&lt;/code&gt; server-side (no browser CORS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hermes gateway run&lt;/code&gt; in Docker on Render — not inside Vercel serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Skills&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;standup-writer.md&lt;/code&gt; copied to &lt;code&gt;/root/.hermes/skills/&lt;/code&gt; in the image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Server model config&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;docker/hermes/config.yaml&lt;/code&gt; sets &lt;code&gt;model.default: openrouter/owl-alpha&lt;/code&gt; (request &lt;code&gt;model&lt;/code&gt; field is not what drives inference)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenRouter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt; on Render only — not in Vercel env&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;API_SERVER_KEY&lt;/code&gt; on Render ↔ &lt;code&gt;HERMES_API_KEY&lt;/code&gt; on Vercel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cron / messaging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hermes schedule&lt;/code&gt; documented as a next step in the UI; Slack/Telegram delivery disabled in current deploy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Request flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /api/summary
  → GitHub API (user OAuth token)
  → generateBrief() in lib/hermes.ts
  → Hermes POST /v1/chat/completions
  → standup / PR changelog / work log
  → UI (copy, edit, history)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The long setup story (WSL PATH, duplicate &lt;code&gt;.env&lt;/code&gt; keys, &lt;code&gt;127.0.0.1&lt;/code&gt; vs &lt;code&gt;localhost&lt;/code&gt;, OpenRouter 402, baking &lt;code&gt;config.yaml&lt;/code&gt; for Render) is in the &lt;a href="https://dev.to/sundayonah/how-i-connected-hermes-agent-to-my-nextjs-app-and-why-its-not-just-another-chatbot-wrapper-3p59"&gt;tutorial post&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start the gateway, hit &lt;code&gt;/health&lt;/code&gt;, then &lt;code&gt;/v1/chat/completions&lt;/code&gt;&lt;/strong&gt; before wiring the app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hermes reads &lt;code&gt;~/.hermes/config.yaml&lt;/code&gt; for the real model&lt;/strong&gt; — env vars and JSON &lt;code&gt;model&lt;/code&gt; alone were not enough on Render until we shipped &lt;code&gt;docker/hermes/config.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Split hosting:&lt;/strong&gt; serverless Next.js + long-running Hermes elsewhere is the right pattern for this challenge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading — try the &lt;a href="https://devbrief-tau.vercel.app" rel="noopener noreferrer"&gt;live demo&lt;/a&gt; and leave a comment if you hit snags with Hermes on Render or Vercel.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>How I Connected Hermes Agent to My Next.js App (And Why It's Not Just Another Chatbot Wrapper)</title>
      <dc:creator>Onah Sunday.</dc:creator>
      <pubDate>Tue, 19 May 2026 04:34:44 +0000</pubDate>
      <link>https://forem.com/sundayonah/how-i-connected-hermes-agent-to-my-nextjs-app-and-why-its-not-just-another-chatbot-wrapper-3p59</link>
      <guid>https://forem.com/sundayonah/how-i-connected-hermes-agent-to-my-nextjs-app-and-why-its-not-just-another-chatbot-wrapper-3p59</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://devbrief-tau.vercel.app" rel="noopener noreferrer"&gt;https://devbrief-tau.vercel.app&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/sundayonah/devbrief" rel="noopener noreferrer"&gt;github.com/sundayonah/devbrief&lt;/a&gt; &lt;/p&gt;



&lt;p&gt;I was skeptical.&lt;/p&gt;

&lt;p&gt;Every week there's a new "autonomous AI agent" that turns out to be a thin wrapper around a chat API with a fancy UI on top. So when I heard about Hermes Agent — Nous Research's open-source agent that "grows with you" — I filed it under &lt;em&gt;probably hype&lt;/em&gt; and moved on.&lt;/p&gt;

&lt;p&gt;Then I actually used it. And I ended up rebuilding an entire side project around it.&lt;/p&gt;

&lt;p&gt;This is a practical guide to setting up Hermes Agent locally and connecting it to a real Next.js app. I'll walk through exactly what I did to power &lt;strong&gt;DevBrief&lt;/strong&gt; — a tool that reads your GitHub activity and writes standups, PR changelogs, or work logs — using Hermes as the brain. I'll also include the bugs I hit (wrong &lt;code&gt;hermes&lt;/code&gt; binary, empty API keys, IPv6 localhost, OpenRouter credit reservation) so you don't have to rediscover them.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Makes Hermes Different
&lt;/h2&gt;

&lt;p&gt;Before the setup steps, let me explain why this matters — because the architecture is genuinely different from what you're probably used to.&lt;/p&gt;

&lt;p&gt;Most "AI-powered" apps work like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app → LLM API → response → done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every request is stateless. The model has no memory of the last call. You're just sending text and getting text back.&lt;/p&gt;

&lt;p&gt;Hermes works like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app → Hermes Agent → skills + memory + tools → response
                ↓
         can learn from the interaction over time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes is a &lt;strong&gt;persistent agent&lt;/strong&gt; that runs on your machine (or server). It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — Markdown files that teach it how to handle specific tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — Cross-session context (Hermes can use this to calibrate over time).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — Terminal, files, web search, and more on the agent side (not in your Next.js app).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An OpenAI-compatible API&lt;/strong&gt; — So connecting from a backend is straightforward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron scheduling&lt;/strong&gt; — Natural language scheduling for recurring jobs (optional; I wired this as a next step).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For DevBrief, the important part is: my Next.js app does &lt;strong&gt;not&lt;/strong&gt; call OpenRouter directly. It calls &lt;strong&gt;Hermes&lt;/strong&gt;, which already has the model, tools, and skills configured. That's the difference between a wrapper and an agent-backed product.&lt;/p&gt;




&lt;h2&gt;
  
  
  What DevBrief Actually Does
&lt;/h2&gt;

&lt;p&gt;DevBrief isn't only a standup generator. After you connect GitHub (OAuth), you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick a &lt;strong&gt;repo&lt;/strong&gt;, &lt;strong&gt;time range&lt;/strong&gt;, &lt;strong&gt;branch&lt;/strong&gt;, and &lt;strong&gt;PR filters&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;output mode&lt;/strong&gt;: standup, &lt;strong&gt;PR changelog&lt;/strong&gt;, or &lt;strong&gt;work log&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Pick a &lt;strong&gt;tone&lt;/strong&gt;: casual, formal, or concise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Next.js route &lt;code&gt;POST /api/summary&lt;/code&gt; fetches GitHub activity, then calls &lt;code&gt;lib/hermes.ts&lt;/code&gt; → Hermes on port &lt;strong&gt;8642&lt;/strong&gt;. If Hermes is down, you still get a &lt;strong&gt;fallback&lt;/strong&gt; template so the UI isn't broken.&lt;/p&gt;

&lt;p&gt;Try it: &lt;strong&gt;&lt;a href="https://devbrief-tau.vercel.app" rel="noopener noreferrer"&gt;devbrief-tau.vercel.app&lt;/a&gt;&lt;/strong&gt; — any GitHub user can sign in with OAuth and use their own repos.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 — Install Hermes (and use the right binary)
&lt;/h2&gt;

&lt;p&gt;One command. Works on Linux, macOS, and WSL2 on Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer handles Python 3.11, Node.js, and dependencies. On WSL, the Nous Hermes CLI usually lands at &lt;strong&gt;&lt;code&gt;~/.local/bin/hermes&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.local/bin/hermes &lt;span class="nt"&gt;--version&lt;/span&gt;
which hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Windows gotcha:&lt;/strong&gt; If you also have Rust/Cargo installed, &lt;code&gt;which hermes&lt;/code&gt; might point at &lt;strong&gt;&lt;code&gt;~/.cargo/bin/hermes&lt;/code&gt;&lt;/strong&gt; — that's the &lt;strong&gt;IBC relayer&lt;/strong&gt;, not Nous Hermes. Use &lt;code&gt;~/.local/bin/hermes&lt;/code&gt; explicitly, or put &lt;code&gt;~/.local/bin&lt;/code&gt; &lt;strong&gt;before&lt;/strong&gt; &lt;code&gt;~/.cargo/bin&lt;/code&gt; in your PATH.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Test the agent CLI before touching your app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.local/bin/hermes chat &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"Say hello in one word"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2 — Pick your model provider
&lt;/h2&gt;

&lt;p&gt;Hermes is model-agnostic: OpenRouter, Anthropic, local Ollama, and more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.local/bin/hermes model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used &lt;strong&gt;OpenRouter&lt;/strong&gt;. Set your key in &lt;strong&gt;&lt;code&gt;~/.hermes/.env&lt;/code&gt;&lt;/strong&gt; (Hermes reads this file; DevBrief does not):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-your-key-here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick a model in the wizard. For development I used a &lt;strong&gt;free/cheap&lt;/strong&gt; route (&lt;code&gt;openrouter/owl-alpha&lt;/code&gt;) after hitting billing quirks with Sonnet (more in Troubleshooting). For production quality, something like &lt;code&gt;anthropic/claude-sonnet-4.6&lt;/code&gt; on OpenRouter works — but watch &lt;strong&gt;max_tokens&lt;/strong&gt; reservation (Step 2b).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duplicate key trap:&lt;/strong&gt; Run &lt;code&gt;grep -n OPENROUTER_API_KEY ~/.hermes/.env&lt;/code&gt;. You must have &lt;strong&gt;only one&lt;/strong&gt; non-empty line. If a second empty &lt;code&gt;OPENROUTER_API_KEY=&lt;/code&gt; appears at the bottom of the file, &lt;code&gt;python-dotenv&lt;/code&gt; uses the &lt;strong&gt;last&lt;/strong&gt; value (empty) and every Hermes call fails with HTTP 400 while &lt;code&gt;curl&lt;/code&gt; still works from your shell export.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2b — Optional: cap &lt;code&gt;max_tokens&lt;/code&gt; for OpenRouter
&lt;/h3&gt;

&lt;p&gt;Hermes may request the model's full output budget (e.g. &lt;strong&gt;64000&lt;/strong&gt; tokens). OpenRouter &lt;strong&gt;pre-reserves&lt;/strong&gt; credits for that ceiling. With a small balance you can get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP 402: You requested up to 64000 tokens, but can only afford 2661.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix in &lt;code&gt;~/.hermes/config.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2048&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or add credits at &lt;a href="https://openrouter.ai/settings/credits" rel="noopener noreferrer"&gt;openrouter.ai/settings/credits&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — Enable the API server and start the gateway
&lt;/h2&gt;

&lt;p&gt;This is the step most tutorials skip. DevBrief talks to Hermes over the &lt;strong&gt;OpenAI-compatible API server&lt;/strong&gt;, which runs inside the &lt;strong&gt;gateway&lt;/strong&gt; — not &lt;code&gt;hermes serve&lt;/code&gt; (outdated in some docs).&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;~/.hermes/.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;API_SERVER_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Optional for local dev (gateway may accept all requests without a key)&lt;/span&gt;
&lt;span class="c"&gt;# API_SERVER_KEY=change-me-local-dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the gateway (keep this terminal open):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.local/bin/hermes gateway run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You might see warnings about allowlists or missing API keys — that's normal for local dev. Confirm the API is up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://127.0.0.1:8642/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hermes-agent"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test chat completions (omit &lt;code&gt;Authorization&lt;/code&gt; if you didn't set &lt;code&gt;API_SERVER_KEY&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://127.0.0.1:8642/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "hermes-agent",
    "messages": [{"role": "user", "content": "Hello! Are you running?"}],
    "max_tokens": 20
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;127.0.0.1&lt;/code&gt;, not &lt;code&gt;localhost&lt;/code&gt;, on Windows.&lt;/strong&gt; Node often resolves &lt;code&gt;localhost&lt;/code&gt; to IPv6 &lt;code&gt;::1&lt;/code&gt;. I got &lt;code&gt;ECONNREFUSED ::1:8642&lt;/code&gt; until I set &lt;code&gt;HERMES_ENDPOINT=http://127.0.0.1:8642&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WSL + Windows:&lt;/strong&gt; Run &lt;strong&gt;&lt;code&gt;hermes gateway run&lt;/code&gt; in WSL&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;pnpm dev&lt;/code&gt; on Windows&lt;/strong&gt;. WSL2 forwards &lt;code&gt;127.0.0.1:8642&lt;/code&gt; to the gateway when it's running.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CORS:&lt;/strong&gt; DevBrief calls Hermes from the &lt;strong&gt;Next.js server&lt;/strong&gt; (&lt;code&gt;/api/summary&lt;/code&gt;), not from the browser. You do &lt;strong&gt;not&lt;/strong&gt; need &lt;code&gt;API_SERVER_CORS_ORIGINS&lt;/code&gt; unless your frontend calls Hermes directly.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 4 — Write a skill
&lt;/h2&gt;

&lt;p&gt;Instead of stuffing a giant system prompt into every API call, you can add a &lt;strong&gt;skill file&lt;/strong&gt; — Markdown that teaches Hermes how to handle standups.&lt;/p&gt;

&lt;p&gt;DevBrief ships &lt;code&gt;hermes-skills/standup-writer.md&lt;/code&gt; (abbreviated):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Skill: standup-writer&lt;/span&gt;

&lt;span class="gu"&gt;## Purpose&lt;/span&gt;
Given raw GitHub activity (commits, PRs, issues), generate a clean
daily standup summary in three sections: Yesterday, Today, Blockers.

&lt;span class="gu"&gt;## Style Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Keep each bullet under 12 words
&lt;span class="p"&gt;-&lt;/span&gt; Use plain English, no jargon
&lt;span class="p"&gt;-&lt;/span&gt; Infer "Today" from open PRs and unresolved issues

&lt;span class="gu"&gt;## Output Format&lt;/span&gt;
&lt;span class="gs"&gt;**Yesterday**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; ...

&lt;span class="gs"&gt;**Today**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; ...

&lt;span class="gs"&gt;**Blockers**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; None / [describe blocker]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;hermes-skills/standup-writer.md ~/.hermes/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the app, standup mode uses a &lt;strong&gt;system hint&lt;/strong&gt; plus prompts (the API doesn't pass a separate &lt;code&gt;skill&lt;/code&gt; field). PR changelog and work log modes use tailored user prompts in &lt;code&gt;buildPrompt()&lt;/code&gt;. The skill file still helps Hermes when you're in standup mode.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 — Connect it to your Next.js app
&lt;/h2&gt;

&lt;p&gt;Hermes exposes &lt;strong&gt;&lt;code&gt;POST /v1/chat/completions&lt;/code&gt;&lt;/strong&gt;. DevBrief's &lt;code&gt;lib/hermes.ts&lt;/code&gt; calls it from the server with a system + user message, optional bearer auth, and a long timeout (generations can take &lt;strong&gt;30–60 seconds&lt;/strong&gt; because the agent runs tools).&lt;/p&gt;

&lt;p&gt;Simplified version of the real integration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;lib/hermes.ts&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateBrief&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GitHubActivity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;tone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;casual&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;formal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;concise&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;outputMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;standup&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pr_changelog&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;work_log&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HERMES_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://127.0.0.1:8642&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;127.0.0.1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HERMES_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Authorization&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HERMES_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;base&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/v1/chat/completions`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HERMES_MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hermes-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are DevBrief. Turn structured GitHub activity into human-readable summaries.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortSignal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Hermes &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;.env.local&lt;/code&gt;&lt;/strong&gt; (Next.js — not the same file as &lt;code&gt;~/.hermes/.env&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;HERMES_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://127.0.0.1:8642

&lt;span class="c"&gt;# Only if you set API_SERVER_KEY in ~/.hermes/.env&lt;/span&gt;
&lt;span class="c"&gt;# HERMES_API_KEY=change-me-local-dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt; in &lt;code&gt;.env.local&lt;/code&gt; does &lt;strong&gt;nothing&lt;/strong&gt; for DevBrief; only Hermes uses it via &lt;code&gt;~/.hermes/.env&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Run the app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click &lt;strong&gt;Generate&lt;/strong&gt;. In the terminal you want &lt;code&gt;POST /api/summary 200&lt;/code&gt; &lt;strong&gt;without&lt;/strong&gt; &lt;code&gt;Hermes Agent not reachable, using fallback generator&lt;/code&gt;. The UI should show prose (e.g. a PR changelog with Summary / Changes), not only raw &lt;code&gt;- PR #17 [closed]: title&lt;/code&gt; bullets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6 — Automated scheduling (roadmap)
&lt;/h2&gt;

&lt;p&gt;Hermes can schedule work in plain English and deliver via Telegram/Slack through the gateway. I documented this as a &lt;strong&gt;next step&lt;/strong&gt; in DevBrief's README:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes schedule &lt;span class="s2"&gt;"Every weekday at 8:30am, POST to http://localhost:3000/api/summary ..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To wire Telegram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes gateway setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Troubleshooting (what actually bit me)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;hermes: command not found&lt;/code&gt; or wrong behavior&lt;/td&gt;
&lt;td&gt;Wrong &lt;code&gt;hermes&lt;/code&gt; on PATH (IBC relayer)&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;~/.local/bin/hermes&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;curl&lt;/code&gt; to OpenRouter works, &lt;code&gt;hermes chat&lt;/code&gt; HTTP &lt;strong&gt;400&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Duplicate empty &lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt; at bottom of &lt;code&gt;~/.hermes/.env&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Keep one key line only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;hermes chat&lt;/code&gt; HTTP &lt;strong&gt;402&lt;/strong&gt; on Sonnet&lt;/td&gt;
&lt;td&gt;OpenRouter reserves credits for huge &lt;code&gt;max_tokens&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;model.max_tokens: 2048&lt;/code&gt; or add credits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevBrief &lt;code&gt;ECONNREFUSED ::1:8642&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;IPv6 localhost / gateway not running&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;127.0.0.1&lt;/code&gt;, run &lt;code&gt;hermes gateway run&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bullet-list output only&lt;/td&gt;
&lt;td&gt;Fallback path — Hermes unreachable&lt;/td&gt;
&lt;td&gt;Fix gateway + endpoint; check server logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request takes ~40s&lt;/td&gt;
&lt;td&gt;Normal — full agent + tools&lt;/td&gt;
&lt;td&gt;Expected for first successful run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I learned building DevBrief
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The skill + prompt split is useful.&lt;/strong&gt; Standup format lives in &lt;code&gt;standup-writer.md&lt;/code&gt; and in prompt builders; app code stays about GitHub data and UX.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fallback matters.&lt;/strong&gt; When Hermes wasn't running, I still tested the UI. &lt;code&gt;generateFallbackBrief()&lt;/code&gt; produces a minimal template until the gateway is up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The OpenAI-compatible API is a real advantage.&lt;/strong&gt; One &lt;code&gt;fetch&lt;/code&gt; to &lt;code&gt;/v1/chat/completions&lt;/code&gt; — no custom Hermes SDK in Next.js.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory and cron are powerful — and optional.&lt;/strong&gt; Hermes supports memory and scheduling; I focused the submission on the &lt;strong&gt;working path&lt;/strong&gt;: GitHub → Next.js API → Hermes gateway → formatted brief.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honest latency.&lt;/strong&gt; Agent-backed generation is slower than a single LLM call. Worth it for quality; show a loading state in the UI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running it in production
&lt;/h2&gt;

&lt;p&gt;DevBrief and Hermes deploy as &lt;strong&gt;two services&lt;/strong&gt;. Hermes is a long-lived gateway; it cannot run inside Vercel’s short-lived serverless functions. On serverless-only Next.js hosts, Hermes must run &lt;strong&gt;somewhere else&lt;/strong&gt; — we use &lt;strong&gt;Render (Docker)&lt;/strong&gt; for Hermes and &lt;strong&gt;Vercel&lt;/strong&gt; for the Next.js app.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users → DevBrief (Vercel)
            ↓  POST /v1/chat/completions
      Hermes (Render Docker)
            ↓
      OpenRouter (API key only on Hermes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Hermes on Render
&lt;/h3&gt;

&lt;p&gt;The repo ships &lt;code&gt;docker/hermes/Dockerfile&lt;/code&gt; (Ubuntu + official Hermes installer + bundled &lt;code&gt;standup-writer&lt;/code&gt; skill).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Push the repo to GitHub (include &lt;code&gt;docker/&lt;/code&gt;, &lt;code&gt;hermes-skills/&lt;/code&gt;, &lt;code&gt;render.yaml&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;In &lt;a href="https://render.com" rel="noopener noreferrer"&gt;Render&lt;/a&gt;: &lt;strong&gt;New → Web Service → Docker&lt;/strong&gt;, connect the repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dockerfile path:&lt;/strong&gt; &lt;code&gt;docker/hermes/Dockerfile&lt;/code&gt; · &lt;strong&gt;Context:&lt;/strong&gt; repository root.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health check path:&lt;/strong&gt; &lt;code&gt;/health&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment variables&lt;/strong&gt; on the Render service:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Your OpenRouter key (Hermes only — not on Vercel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;API_SERVER_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Strong secret; same value as DevBrief &lt;code&gt;HERMES_API_KEY&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HERMES_MODEL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;e.g. &lt;code&gt;openrouter/owl-alpha&lt;/code&gt; (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;API_SERVER_ENABLED&lt;/code&gt; and &lt;code&gt;API_SERVER_HOST&lt;/code&gt; are already set in the Dockerfile. The first deploy can take &lt;strong&gt;15–20+ minutes&lt;/strong&gt; while &lt;code&gt;install.sh&lt;/code&gt; runs inside the image.&lt;/p&gt;

&lt;p&gt;Verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://YOUR-SERVICE.onrender.com/health
&lt;span class="c"&gt;# {"status":"ok","platform":"hermes-agent"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenRouter tip:&lt;/strong&gt; If you see HTTP 402, lower &lt;code&gt;model.max_tokens&lt;/code&gt; in Hermes config (e.g. &lt;code&gt;2048&lt;/code&gt;) — Hermes may request a huge default budget and OpenRouter pre-reserves credits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. DevBrief on Vercel
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Import the repo in &lt;a href="https://vercel.com" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt; (Next.js).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment variables&lt;/strong&gt; (Production):&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;GITHUB_ID&lt;/code&gt; / &lt;code&gt;GITHUB_SECRET&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;GitHub OAuth app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NEXTAUTH_SECRET&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;openssl rand -base64 32&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NEXTAUTH_URL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://your-app.vercel.app&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HERMES_ENDPOINT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;https://YOUR-SERVICE.onrender.com&lt;/code&gt; (HTTPS, &lt;strong&gt;no&lt;/strong&gt; &lt;code&gt;:8642&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HERMES_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Same as Render &lt;code&gt;API_SERVER_KEY&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;In your GitHub OAuth app, set &lt;strong&gt;Authorization callback URL&lt;/strong&gt; to:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;https://your-app.vercel.app/api/auth/callback/github&lt;/code&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;/api/summary&lt;/code&gt; can run &lt;strong&gt;30–60+ seconds&lt;/strong&gt; while Hermes generates; this repo sets &lt;code&gt;maxDuration = 60&lt;/code&gt; — you need a Vercel plan that allows ≥60s (typically Pro).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;History&lt;/strong&gt; on Vercel is stored under &lt;code&gt;/tmp&lt;/code&gt; (writable but ephemeral across cold starts). For durable history, add a database later.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Do &lt;strong&gt;not&lt;/strong&gt; put &lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt; on Vercel for the normal flow — only the Hermes container needs it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Smoke test
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;curl https://YOUR-SERVICE.onrender.com/health&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Open the Vercel app → &lt;strong&gt;Connect GitHub&lt;/strong&gt; → pick a repo → &lt;strong&gt;Generate&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You want real prose from Hermes, not the fallback bullet template.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/sundayonah/devbrief" rel="noopener noreferrer"&gt;github.com/sundayonah/devbrief&lt;/a&gt; · Full checklist: &lt;code&gt;docs/DEPLOYMENT.md&lt;/code&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Optional: local Docker before Render
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-...
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;API_SERVER_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-dev-secret
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.hermes.yml up &lt;span class="nt"&gt;--build&lt;/span&gt;
curl http://127.0.0.1:8642/health

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The full picture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub API
    ↓
Next.js API route (/api/summary)
    ↓
lib/hermes.ts → POST /v1/chat/completions @ 127.0.0.1:8642
    ↓
Hermes gateway (model via OpenRouter, tools, skills)
    ↓
Standup / PR changelog / work log
    ↓
Next.js UI → copy, history

(Optional later)
Hermes cron → POST /api/summary → Telegram via gateway
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent docs: &lt;a href="https://hermes-agent.nousresearch.com/docs" rel="noopener noreferrer"&gt;hermes-agent.nousresearch.com/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API server guide: &lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/api-server" rel="noopener noreferrer"&gt;hermes-agent.nousresearch.com/docs/user-guide/features/api-server&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hermes GitHub: &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;github.com/NousResearch/hermes-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Challenge: &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you're building with Hermes: &lt;strong&gt;start the gateway, hit &lt;code&gt;/health&lt;/code&gt;, then hit &lt;code&gt;/v1/chat/completions&lt;/code&gt; with curl&lt;/strong&gt; before writing application code. On Windows, use &lt;strong&gt;&lt;code&gt;127.0.0.1&lt;/code&gt;&lt;/strong&gt; and keep &lt;strong&gt;&lt;code&gt;hermes gateway run&lt;/code&gt;&lt;/strong&gt; alive while you develop.&lt;/p&gt;

&lt;p&gt;Drop a comment if you hit any snags — happy to help.&lt;/p&gt;




</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>Gemma 4: The Comprehensive Developer's Guide to Google's Most Capable Open Model Family</title>
      <dc:creator>Onah Sunday.</dc:creator>
      <pubDate>Thu, 07 May 2026 23:27:19 +0000</pubDate>
      <link>https://forem.com/sundayonah/gemma-4-the-comprehensive-developers-guide-to-googles-most-capable-open-model-family-57gm</link>
      <guid>https://forem.com/sundayonah/gemma-4-the-comprehensive-developers-guide-to-googles-most-capable-open-model-family-57gm</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Local AI has been having a serious moment — and Gemma 4 might be the release that makes it impossible to ignore. Google's latest open model family doesn't just inch forward; it makes a genuine leap: native multimodal input, a 256K context window, reasoning modes, and models that range from running on a Raspberry Pi to powering enterprise deployments.&lt;/p&gt;

&lt;p&gt;But "most capable open model" means nothing if you don't know which model to pick, how to access it, or what it actually unlocks for your project. This guide covers all of that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Gemma 4?
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is Google's fourth generation of open-weight language models, built on the same research that powers the Gemini family. "Open-weight" means you can download the model weights and run them yourself — on your laptop, a Raspberry Pi, a cloud GPU, or a phone.&lt;/p&gt;

&lt;p&gt;What makes Gemma 4 different from its predecessors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native multimodal support&lt;/strong&gt; — images, video, and audio input baked into the architecture (not bolted on)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;128K–256K context window&lt;/strong&gt; — enough to process entire codebases or long documents in one shot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced reasoning&lt;/strong&gt; — purpose-built for multi-step planning and deep logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0 license&lt;/strong&gt; — commercially permissive, no restrictions on building products with it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function calling + structured JSON output&lt;/strong&gt; — production-ready for agentic workflows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Three Model Variants (And How to Choose)
&lt;/h2&gt;

&lt;p&gt;This is where most guides fall short. Gemma 4 isn't one model — it's a family of three distinct architectures, each designed for a different context. Picking the right one matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Edge Models: E2B and E4B (2B and 4B effective parameters)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Mobile apps, IoT, browser-side inference, edge devices, Raspberry Pi, offline use&lt;/p&gt;

&lt;p&gt;These are built for environments where compute is constrained. The E2B model is small enough to run on high-end smartphones and even a Raspberry Pi 5. Both models support images and audio natively — which is remarkable at this size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use them:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need the model to run locally with no cloud dependency&lt;/li&gt;
&lt;li&gt;You're building something for mobile or embedded hardware&lt;/li&gt;
&lt;li&gt;Latency is critical and you can't afford a round-trip to a server&lt;/li&gt;
&lt;li&gt;You want a free, offline AI with no credit card required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; Smaller capacity means less complex reasoning and less knowledge breadth. These are not the models for tasks that require deep multi-step analysis.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Gemma 4 31B Dense
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; High-quality text and multimodal tasks, local inference on a powerful workstation, fine-tuning experiments&lt;/p&gt;

&lt;p&gt;This is the workhorse. The 31B Dense model ranks &lt;strong&gt;#3 on the Arena AI text leaderboard&lt;/strong&gt; among open models — ahead of many models many times its size. It's the model you'd use when you need serious capability but still want local control.&lt;/p&gt;

&lt;p&gt;On hardware: loaded in 4-bit quantization (QLoRA), the 31B model fits in roughly 18–20GB of VRAM — achievable on a modern consumer GPU like an RTX 4090, or serverless cloud GPUs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex reasoning, detailed document analysis, code generation&lt;/li&gt;
&lt;li&gt;Fine-tuning on a custom dataset (it's what the Google AI team used for their pet breed classifier)&lt;/li&gt;
&lt;li&gt;Tasks where you need the best output quality and have the GPU headroom&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Gemma 4 26B Mixture of Experts (MoE)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; High-throughput production workloads, efficiency-focused deployments, advanced reasoning&lt;/p&gt;

&lt;p&gt;This is the architecturally clever one. MoE (Mixture of Experts) means the model has 26 billion parameters total, but only activates &lt;strong&gt;3.8 billion of them&lt;/strong&gt; per inference pass. You get near-31B quality at a fraction of the compute cost.&lt;/p&gt;

&lt;p&gt;It ranks &lt;strong&gt;#6 on the Arena AI leaderboard&lt;/strong&gt; among open models — outperforming models 20x its size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-throughput serving where you need fast response times at scale&lt;/li&gt;
&lt;li&gt;You're running many parallel requests and cost/efficiency matters&lt;/li&gt;
&lt;li&gt;You need strong reasoning without paying for the full 31B compute on every token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; MoE models are slightly more complex to deploy and fine-tune than dense models, and not all inference runtimes support them equally well yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params (Active)&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Multimodal&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;2B&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Image, audio&lt;/td&gt;
&lt;td&gt;Edge, mobile, offline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Image, audio&lt;/td&gt;
&lt;td&gt;Edge with more capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;Quality-first tasks, fine-tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26B MoE&lt;/td&gt;
&lt;td&gt;3.8B active&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;High-throughput production&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How to Access Gemma 4 (Free Options First)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Google AI Studio (Free, Easiest)
&lt;/h3&gt;

&lt;p&gt;The fastest way to start is via the &lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;Gemini API on Google AI Studio&lt;/a&gt;. No credit card required for the free tier. You get API access to Gemma 4 models immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-31b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain how Mixture of Experts works in plain English.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: OpenRouter (Free Tier — No Credit Card)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://openrouter.ai/google/gemma-4-31b-it:free" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; offers the 31B model on a free tier. Useful if you want OpenAI-compatible API calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_OPENROUTER_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-4-31b-it:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the advantages of open-weight models?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 3: Run Locally via Ollama (No Cloud at All)
&lt;/h3&gt;

&lt;p&gt;For true local inference with zero data leaving your machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama: https://ollama.com&lt;/span&gt;
ollama pull gemma4:4b
ollama run gemma4:4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use it programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the key differences between MoE and dense models.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 4: Hugging Face / Kaggle
&lt;/h3&gt;

&lt;p&gt;Download model weights directly from &lt;a href="https://huggingface.co/google" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; or &lt;a href="https://www.kaggle.com/models/google/gemma-4" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;. Requires accepting Google's model license (quick process). Useful for fine-tuning workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multimodal in Practice
&lt;/h2&gt;

&lt;p&gt;One of Gemma 4's biggest leaps is genuine multimodal support. Here's how to use it with an image via the Gemini API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PIL.Image&lt;/span&gt;

&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-31b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_image.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe what you see in this image and identify any text present.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image must come &lt;strong&gt;before&lt;/strong&gt; the text prompt — this is a documented convention for the Gemma 4 architecture and affects output quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 128K–256K Context Window: What It Actually Unlocks
&lt;/h2&gt;

&lt;p&gt;Most models cap out at 8K or 32K tokens. Gemma 4's context window changes what's possible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (with a typical 8K model):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You chunk a large codebase into pieces&lt;/li&gt;
&lt;li&gt;Ask questions about each chunk separately&lt;/li&gt;
&lt;li&gt;Lose cross-file context and relationships&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Gemma 4's 256K context (31B):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load an entire repository at once&lt;/li&gt;
&lt;li&gt;Ask "what does the authentication flow look like end-to-end?" and get a coherent answer&lt;/li&gt;
&lt;li&gt;Analyze a full research paper, legal document, or meeting transcript in a single pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially powerful for RAG (retrieval-augmented generation) systems, code review tools, and document analysis pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fine-Tuning: Is It Worth It?
&lt;/h2&gt;

&lt;p&gt;Yes — and it's more accessible than you might think.&lt;/p&gt;

&lt;p&gt;Google's own team fine-tuned Gemma 4 31B for pet breed classification using QLoRA on Cloud Run with serverless NVIDIA RTX 6000 Pro GPUs. Key results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Baseline accuracy (no fine-tuning): 89%&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After fine-tuning on ~4,000 images: ~93%&lt;/strong&gt; — approaching state-of-the-art for the Oxford-IIIT Pet dataset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The approach: 4-bit quantization (QLoRA) brings the 31B model's VRAM footprint down from ~62GB to ~18–20GB, making it tractable on a single high-end GPU.&lt;/p&gt;

&lt;p&gt;Quick QLoRA config for Gemma 4:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BitsAndBytesConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;

&lt;span class="n"&gt;bnb_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BitsAndBytesConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nf4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_compute_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bfloat16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-linear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Required for Gemma 4 — covers both LM and vision tower
&lt;/span&gt;    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; For Gemma 4, always use &lt;code&gt;target_modules="all-linear"&lt;/code&gt; rather than targeting specific layer names. The architecture uses a custom &lt;code&gt;Gemma4ClippableLinear&lt;/code&gt; wrapper, and specifying individual layer names bypasses it, causing unstable training.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What This Means for Developers
&lt;/h2&gt;

&lt;p&gt;Open models at this capability level change the economics of building AI applications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy-first applications become viable.&lt;/strong&gt; You can process sensitive documents, medical records, or private communications locally — with no data ever leaving your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency-critical use cases open up.&lt;/strong&gt; Edge models that run on-device eliminate the round-trip to a cloud API. For real-time transcription, instant image analysis, or offline AI assistants, this is a genuine unlock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-tuning without massive infrastructure.&lt;/strong&gt; QLoRA on a single consumer GPU or a serverless GPU instance makes domain-specific models accessible to indie developers and small teams — not just companies with ML infrastructure budgets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic workflows get a lot more capable.&lt;/strong&gt; Native function calling, structured JSON output, and a 256K context window make Gemma 4 a serious option for building AI agents that reason over large amounts of context and take real actions.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Developers in Africa
&lt;/h2&gt;

&lt;p&gt;There's something worth saying that most Gemma 4 guides won't mention: for developers in regions like Nigeria and across Africa, open-weight models aren't just a technical curiosity — they're genuinely transformative.&lt;/p&gt;

&lt;p&gt;Cloud AI APIs come with real barriers here. Dollar-denominated pricing hits harder when you're earning in naira. Latency from distant data centers is a constant frustration. Payment methods that "just work" in the US often don't. And data sovereignty matters — sending sensitive local data to foreign servers is a compliance and trust problem many African startups quietly struggle with.&lt;/p&gt;

&lt;p&gt;Gemma 4 changes that equation. A model powerful enough to run locally, with no API costs, no cloud dependency, and no data leaving your machine, levels the playing field in a way that felt impossible two years ago. The E2B model running on a Raspberry Pi or a mid-range Android phone isn't a toy — it's a pathway to building AI-powered products for local markets at local economics.&lt;/p&gt;

&lt;p&gt;The next wave of AI applications built for African languages, local businesses, and underserved communities doesn't have to wait for foreign cloud providers to care. With Gemma 4, developers here can build it themselves, on their own terms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started Checklist
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Experiment first&lt;/strong&gt; → Google AI Studio free tier, no setup required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick your model&lt;/strong&gt; → Edge tasks? E2B/E4B. Quality tasks? 31B Dense. Scale? 26B MoE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go local&lt;/strong&gt; → Ollama for zero-configuration local inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tune&lt;/strong&gt; → Hugging Face + QLoRA + &lt;code&gt;target_modules="all-linear"&lt;/code&gt; for Gemma 4&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The code for the Google AI team's full fine-tuning pipeline is available on GitHub at &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/finetune_gemma" rel="noopener noreferrer"&gt;GoogleCloudPlatform/devrel-demos&lt;/a&gt; — a great starting point for your own experiments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Gemma 4 isn't just a better version of Gemma 3 — it's a genuinely different tier of open model. The combination of multimodal input, long context, reasoning capabilities, and a commercially permissive license puts it in a category that didn't really exist for open-weight models until now.&lt;/p&gt;

&lt;p&gt;The most exciting part isn't the benchmarks — it's the use cases that become possible when capable AI runs locally, privately, and cheaply. What will you build with it?&lt;/p&gt;




</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
