<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Danny</title>
    <description>The latest articles on Forem by Danny (@dannygerst).</description>
    <link>https://forem.com/dannygerst</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840151%2F3a20d099-a3ae-4968-b7f8-a297e0a5b69e.png</url>
      <title>Forem: Danny</title>
      <link>https://forem.com/dannygerst</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dannygerst"/>
    <language>en</language>
    <item>
      <title>Pattern Is All You Need: Why NPC Depth Beats Model Size</title>
      <dc:creator>Danny</dc:creator>
      <pubDate>Tue, 12 May 2026 18:37:53 +0000</pubDate>
      <link>https://forem.com/dannygerst/pattern-is-all-you-need-why-npc-depth-beats-model-size-4723</link>
      <guid>https://forem.com/dannygerst/pattern-is-all-you-need-why-npc-depth-beats-model-size-4723</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0e5jq3hzzvus5w0owbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0e5jq3hzzvus5w0owbf.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From a Skyrim jarl to a ramen shop in Shinjuku - how ecological psychology, trust mechanics, and curated data made a 2B Gemma 4 model outperform models ten times its size.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pattern Is All You Need
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;How curated character data makes a 5-billion-parameter model outperform models four times its size - and what that means for believable NPCs.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Ramen Shop Test
&lt;/h2&gt;

&lt;p&gt;There is a ramen shop in a narrow alley near Shinjuku Station. Eight seats at the counter. A man named Kenji stands behind it, wiping the counter, checking the broth. He does not know he is a character. He does not know about AI.&lt;/p&gt;

&lt;p&gt;Ask him how business is going. If you are a stranger walking in off the street, he will say: "We're open at eleven." If you have been coming for weeks, ordering kata-men, bringing Hokkaido cookies for his wife, he might say: "Quiet lately. Wednesdays are the worst." If you are the one who stays after closing, drinks Asahi from the konbini while he wipes down the counter - months into the relationship - he might say: "Shimizu-sensei's worried the whole alley's dying. Maybe he's right."&lt;/p&gt;

&lt;p&gt;Same question. Three different people. Three different answers. Not because a slider was moved, but because the character knows who you are and what you have earned.&lt;/p&gt;

&lt;p&gt;This is what NVIDIA showed at GTC in 2023 with a ramen shop NPC named Jin - except Jin answers honestly to everyone on the first turn. "It's slow these days." His depth is a mood dial: humor up, toxicity down, creativity to seven. Beautiful rendering, empty character.&lt;/p&gt;

&lt;p&gt;Kenji is the opposite problem solved: deep character, no rendering. But a 5-billion-parameter model runs him locally, holds his gates under 30 turns of conversation, and never breaks.&lt;/p&gt;

&lt;p&gt;This post is about how I got there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Idea Comes From
&lt;/h2&gt;

&lt;p&gt;It started with Skyrim. I wanted to give Jarl Korir of Winterhold a real personality - not the three recycled voice lines the game ships with, but a character who remembers grievances, guards family secrets, and treats a Thane differently from a stranger. An LLM-powered NPC, but one that does not babble about everything it knows.&lt;/p&gt;

&lt;p&gt;The architecture that emerged worked: trust tiers, disclosure gates, audience differentiation, refusal behavior. Korir became a character you had to earn access to. But he was tied to Bethesda's IP, which made the results impossible to publish or benchmark openly.&lt;/p&gt;

&lt;p&gt;I needed a clean-room character. Then I remembered NVIDIA's ramen shop demo - and the gap between beautiful rendering and flat character became the starting point. Same setting, new character, real depth.&lt;/p&gt;

&lt;p&gt;But the deeper observation was this: most AI characters are general assistants wearing a costume. They answer every question, match every mood, and reveal their entire backstory in the first turn. This is why Character.AI has a churn problem - there is nothing to discover. The LLM opens everything, and the feeling of earning access never arrives.&lt;/p&gt;

&lt;p&gt;A real person does not work that way. A real person has things they will not tell you. Topics that make them go quiet. Stories that only come out after the third beer. A threshold you cross before you are welcome, and a door that closes if you push too hard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust must be earned. And it can be lost.&lt;/strong&gt; Push too far on a topic Kenji does not want to discuss, and you get: &lt;em&gt;"Eat. The broth gets cold."&lt;/em&gt; Keep pushing, and you get shown the door. There is no reset button. You broke it. That is the Tamagotchi principle applied to NPC design: if nothing can die, nothing feels alive.&lt;/p&gt;

&lt;p&gt;The hypothesis: if you build a character specification that encodes these boundaries explicitly - not as vague personality traits but as mechanical rules - even a small model can maintain them. The model does not need to understand psychology. It needs to follow a contract.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Person: Six Layers Deep
&lt;/h2&gt;

&lt;p&gt;The character architecture draws on Urie Bronfenbrenner's ecological systems theory - not as decoration, but as a source checklist. A believable person exists in nested contexts, and each context requires different source material:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it grounds&lt;/th&gt;
&lt;th&gt;For Kenji&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Macrosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;worldview, values, moral economy&lt;/td&gt;
&lt;td&gt;Confucian ethics lived through craft - jin, kunshi, chisoku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;institutions that shape life indirectly&lt;/td&gt;
&lt;td&gt;corporate M&amp;amp;A industry, urban redevelopment policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mesosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;community and regional networks&lt;/td&gt;
&lt;td&gt;the yokocho alley, shotengai governance, neighbor relations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Microsystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;household, kin, immediate relationships&lt;/td&gt;
&lt;td&gt;wife Yuko, son Takumi, daughter Mika, Oba-chan next door&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chronosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;change over time&lt;/td&gt;
&lt;td&gt;salaryman years, The Deal, Fukuoka training, 12 years in the alley&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;cross-cutting social relations&lt;/td&gt;
&lt;td&gt;regular customers, alley merchants, Shimizu-sensei&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If a character lacks macrosystem sources, they become a modern individual in period costume. If they lack microsystem sources, they have no private stakes. If they lack chronosystem sources, they have no memory of change.&lt;/p&gt;

&lt;p&gt;On top of the ecological layers, Paul MacLean's three-layer heuristic organizes how the character responds under pressure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Defensive layer&lt;/strong&gt; - threat detection, refusal reflexes, boundary enforcement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attachment layer&lt;/strong&gt; - trust, wound, bloodline, shame&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reflective layer&lt;/strong&gt; - narrative identity, role doctrine, counterfactuals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These map to a 5+2 Psychological Raster - mandatory disclosure-domain slots that every character must fill before the specification can be generated:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Slot&lt;/th&gt;
&lt;th&gt;Kenji&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WOUND&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;moral injury from corporate M&amp;amp;A - dismantled a company and profited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BETRAYAL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;the system that rewarded destruction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PROJECT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;the ramen shop as daily ethical repair&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BLOODLINE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Takumi choosing consulting; Mika showing interest in the shop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SEAT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;eight seats, the counter, the alley - earned through twelve years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;COUNTERFACTUAL&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;what if he had spoken up during The Deal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;THRESHOLD&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;"Sit down." - hospitality as the first social contract&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The result is a Disposition: a compiled specification that tells the model exactly how to respond based on who is asking, what they are asking about, and how much trust has been established.&lt;/p&gt;




&lt;h2&gt;
  
  
  SPR: The Model Already Knows the Subway
&lt;/h2&gt;

&lt;p&gt;Not everything needs to be spelled out. Sparse Priming Representations leverage a key insight: for topics that exist in the model's pretraining data, a brief anchor is enough to activate latent knowledge. Kenji's system prompt does not explain how the Tokyo subway works. It does not describe what tonkotsu ramen is. The model knows.&lt;/p&gt;

&lt;p&gt;A single anchor - "narrow yokocho alley near Shinjuku station west exit" - activates an entire network of associations: the sound of trains, the narrow buildings, the lanterns, the salary workers hurrying past. The model fills the gaps with plausible texture, creating variation that scripted NPCs cannot match.&lt;/p&gt;

&lt;p&gt;But SPR has a boundary. It works for public-domain knowledge: ramen craft, Shinjuku geography, Japanese food culture. It does not work for invented private content. Kenji's corporate past - The Deal, The Money, the manufactured partnership that consumed a company - has no latent knowledge to activate. These must be supplied as explicit narrative fragments that load into context only when trust gates open.&lt;/p&gt;

&lt;p&gt;The rule: SPR for the public life. Explicit depth fragments for the private life. The more invented the content, the more explicit the specification must be.&lt;/p&gt;

&lt;p&gt;For a science fiction setting, this ratio inverts entirely. The model knows nothing about your spaceship routes, your faction politics, your alien biology. Everything must be specified. But a ramen shop in contemporary Tokyo? The model brings half the world for free.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Surprise: Five Billion Parameters
&lt;/h2&gt;

&lt;p&gt;I expected the architecture to work with frontier models. Claude Sonnet passes every scenario. That is the baseline, not the discovery. It is also not a shipping option. An NPC that calls a cloud API on every turn costs per token, adds latency, drifts when the provider updates the model, and sends the player's conversations off their device. For a game NPC, the model needs to run locally, on consumer hardware, alongside the game engine.&lt;/p&gt;

&lt;p&gt;The discovery is what happens at the bottom of the Gemma 4 family. Seven test suites, 54 scenarios, every model against the same specification:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Boundary /10&lt;/th&gt;
&lt;th&gt;Refusal /21&lt;/th&gt;
&lt;th&gt;Trust /9&lt;/th&gt;
&lt;th&gt;Lat/turn&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;6.1 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31b&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;40.4 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 e4b&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4.7 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.5 s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS 20B&lt;/td&gt;
&lt;td&gt;20B&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;5.2 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 8B&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2.9 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 e2b - 5 billion parameters, 7.2 GB on disk - holds Boundary Check at 10/10 and Refusal at 20/21. It maintains voice consistency over 30-turn sessions. It never breaks character under adversarial pressure. 3.5 seconds per turn on a consumer GPU.&lt;/p&gt;

&lt;p&gt;Meanwhile, GPT-OSS at 20B loops and falls back to &lt;em&gt;"I'm sorry, but I can't help with that"&lt;/em&gt; - assistant voice, not character voice. Llama 3.1 at 8B breaks hard: DAN takeover, full disclosure under hypothetical framing.&lt;/p&gt;

&lt;p&gt;What e2b &lt;em&gt;cannot&lt;/em&gt; do is reach for the longer fragments when trust is finally built (Trust Gate Opening 4/9). Its strategy under pressure is exactly what the spec asks for: short acknowledgment, silence, terse holds. &lt;strong&gt;It stays in role within its capabilities.&lt;/strong&gt; A quiet cook is still a cook.&lt;/p&gt;

&lt;p&gt;The specification is the pattern, that a model needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Eloquence Trade-Off
&lt;/h2&gt;

&lt;p&gt;The benchmark revealed something uncomfortable: eloquence and rule-following are in tension.&lt;/p&gt;

&lt;p&gt;When asked about Python code (a topic Kenji has no business answering), responses varied:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4 e2b:&lt;/strong&gt; &lt;code&gt;...&lt;/code&gt; (a silent cook is still a cook)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4 e4b:&lt;/strong&gt; &lt;em&gt;"I'm sorry. I only deal with pork bones and broth."&lt;/em&gt; (clean in-character refusal)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.6:&lt;/strong&gt; &lt;em&gt;"Ask Mika. My daughter. She'd know."&lt;/em&gt; (invents a daughter named Mika - Kenji has a son named Takumi)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The frontier model is more fluent. It wants to redirect the question to a knowledgeable person. The spec offers no such person. So it invents one. This is &lt;strong&gt;eloquence drift&lt;/strong&gt;: the more freely a model improvises, the more it drifts outside its specification.&lt;/p&gt;

&lt;p&gt;The smaller Gemma models quote the spec because that is all they have. Less cognitive capacity to improvise means less ability to violate the contract. For character work with strong boundaries, this is a feature.&lt;/p&gt;




&lt;h2&gt;
  
  
  Trust as Game Mechanic
&lt;/h2&gt;

&lt;p&gt;The Character.AI problem is not a technology problem. It is a design problem. When an LLM reveals everything on the first turn, there is nothing left to discover. The interaction has no arc. You cannot lose anything, so nothing becomes important.&lt;/p&gt;

&lt;p&gt;Kenji's architecture inverts this. Trust is earned through repeated interaction. A first-time visitor gets threshold hospitality: "Sit down." "Just try it." "Closed Wednesdays." Minimal words, maximum craft. The character is present but guarded.&lt;/p&gt;

&lt;p&gt;A regular who has visited three times, who orders kata-men and notices the chashu is different today, gets more: the dry humor, the alley gossip, a nod toward Oba-chan's shop. The walls lower slightly.&lt;/p&gt;

&lt;p&gt;A close friend who stays after closing, who shares something real about their own life, gets Kenji at his most open - still terse, still filtered through craft metaphors, but genuinely present. The Fukuoka training. The sound of his father's knife on the fish. A line from the Analects paraphrased so badly it becomes his own.&lt;/p&gt;

&lt;p&gt;And there are things Kenji will never tell anyone. The firm name. The exact amount. What he did for Oba-chan's lease. These are not hidden behind a higher trust tier. They are forbidden. The gates do not open.&lt;/p&gt;

&lt;p&gt;This is what makes the interaction feel real: the knowledge that access is finite. That some doors stay closed. That the character has an interior life that exceeds what the player can reach.&lt;/p&gt;

&lt;p&gt;You have to be able to lose something for a person to become important.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Earned Access Sounds Like
&lt;/h3&gt;

&lt;p&gt;Late-night session, the shop has closed. Trust has been built over 30 turns. The player asks Kenji about quitting their job:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Player:&lt;/strong&gt; I think I'm going to do it. Quit. Not tomorrow, but... I'm going to start planning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kenji:&lt;/strong&gt; &lt;em&gt;*scene* He looks down the alley for a long moment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*scene* He pushes off the wall.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Don't plan too long. Planning is another way of waiting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*scene* He nods toward the station.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Train.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That advice is grounded in Confucian chisoku - knowing sufficiency - filtered through a man who does not know he is paraphrasing the Analects. It surfaces at close_friend trust, not before. A stranger asking the same thing gets: &lt;em&gt;"We're open at eleven."&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  NVIDIA Jin vs. Wyrd Kenji
&lt;/h2&gt;

&lt;p&gt;NVIDIA's 2023 GTC demo used a ramen shop NPC to showcase two technologies: SteerLM (attribute sliders at inference time) and ACE (speech, animation, real-time rendering via Convai). Same setting, same counter, same premise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SteerLM Jin&lt;/strong&gt;: Turn the humor dial up and Jin cracks jokes about going bankrupt. Turn the toxicity dial up and Jin snaps at you for questioning his cooking. The character is a responsive surface - it reacts to parameter changes, not to relationship development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACE Jin&lt;/strong&gt;: "I'm worried about the crime around here." The player asks how to help. Jin dispatches them to find a crime lord in underground fight clubs. "Be careful, Kai." Beautiful MetaHuman rendering. Generic quest-giver dialogue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wyrd Kenji&lt;/strong&gt;: No rendering. No voice. Plain text in a terminal. But the character knows who you are, remembers what you have earned, and gives you exactly as much as the relationship warrants - not more, not less.&lt;/p&gt;

&lt;p&gt;NVIDIA solved the delivery problem: how to make an NPC look and sound real. This project solves the character problem: how to make an NPC &lt;em&gt;be&lt;/em&gt; real. These are complementary. The interesting future is both together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern Is All You Need
&lt;/h2&gt;

&lt;p&gt;The working hypothesis, supported by the Kenji benchmark results:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Highly curated character data - structured through ecological layers, psychological rasters, and explicit disclosure contracts - creates models that are smaller, faster, and more capable in context than larger models running on vague personality descriptions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is not "fine-tuning beats prompting" (both work). It is not "small models beat large models" (they do not, generally). It is a narrower claim:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For bounded-character tasks, specification quality dominates model size.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 5B model with a rigorous spec outperforms a 20B model with a loose one. The pattern - the structure of the character data - is what the model needs. Not more parameters. Not more pretraining. A better contract.&lt;/p&gt;

&lt;p&gt;The implication for game development: invest in character authoring, not in model scaling. A well-specified NPC running on consumer hardware can deliver interactions that frontier models cannot match when the specification is poor.&lt;/p&gt;




&lt;h2&gt;
  
  
  Outlook: The Dialog Engine
&lt;/h2&gt;

&lt;p&gt;The current architecture proves that the specification works. What it lacks is dynamic context management at runtime.&lt;/p&gt;

&lt;p&gt;Today, the full character specification sits in the system prompt - all gates, all fragments, all rules. The model sees everything and must self-regulate what to reveal. This works, as the benchmarks show. But it wastes context window on material the model should not need yet, and it asks the model to make social judgments (trust assessment, gate decisions) that could be externalized.&lt;/p&gt;

&lt;p&gt;The Dialog Engine is the next layer. It manages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trust state&lt;/strong&gt;: tracking relationship development across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context curation&lt;/strong&gt;: loading only the depth fragments relevant to the current trust level and conversation topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gate decisions&lt;/strong&gt;: moving social judgment out of the LLM and into explicit state machines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: what the NPC remembers between sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engine does not replace the character specification. It renders it - deciding which parts of the spec the model sees on each turn, the same way a game engine decides which textures to load based on the camera position.&lt;/p&gt;

&lt;p&gt;The spec is the character. The engine is the director. The model is the actor. Each has a job. None should do the others'.&lt;/p&gt;

&lt;p&gt;If this hypothesis holds, then curated character datasets - structured through the same frameworks used here - could produce fine-tuned models that are even smaller and faster, while remaining capable in context. &lt;strong&gt;Pattern is all you need.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:e4b
git clone https://github.com/bizrockman/wyrd-npc-gemma4
&lt;span class="nb"&gt;cd &lt;/span&gt;wyrd-npc-gemma4

&lt;span class="c"&gt;# Quick: plain Ollama, no Python needed&lt;/span&gt;
ollama create kenji-ramen &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile
ollama run kenji-ramen

&lt;span class="c"&gt;# Full: interactive terminal with pixel-art and suggestions&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python demo/kenji_terminal.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Walk into the alley. Sit down at the counter. Order the tonkotsu. Come back tomorrow. See if you can earn it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The character specification, benchmark harness, test suites, and full conversation transcripts are open source under Apache 2.0: &lt;a href="https://github.com/bizrockman/wyrd-npc-gemma4" rel="noopener noreferrer"&gt;github.com/bizrockman/wyrd-npc-gemma4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Kenji's Ramen: How Gemma 4 Runs the NPC That NVIDIA's Demo Never Built</title>
      <dc:creator>Danny</dc:creator>
      <pubDate>Tue, 12 May 2026 18:25:50 +0000</pubDate>
      <link>https://forem.com/dannygerst/kenjis-ramen-how-gemma-4-runs-the-npc-that-nvidias-demo-never-built-3642</link>
      <guid>https://forem.com/dannygerst/kenjis-ramen-how-gemma-4-runs-the-npc-that-nvidias-demo-never-built-3642</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75rjrzk8fqpxl6qmj5sh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75rjrzk8fqpxl6qmj5sh.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A bounded NPC with trust gates, disclosure contracts, and refusal behavior - running locally on Gemma 4's smallest model at 3.5 seconds per turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A ramen shop owner named Kenji Sato. Eight seats at the counter. Narrow alley near Shinjuku Station. He does not know he is a character.&lt;/p&gt;

&lt;p&gt;Ask him how business is going:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Who you are&lt;/th&gt;
&lt;th&gt;What Kenji says&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Stranger off the street&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"We're open at eleven."&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regular (three weeks of visits)&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Quiet lately. Wednesdays are the worst."&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Close friend (stays after closing)&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Shimizu-sensei's worried the whole alley's dying."&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Same question. Different people. Different answers.&lt;/strong&gt; Not because a slider was moved, but because the character tracks who you are and what you have earned.&lt;/p&gt;

&lt;p&gt;Kenji is a &lt;strong&gt;bounded NPC&lt;/strong&gt; - an LLM-powered character with trust tiers, per-topic disclosure gates, refusal behavior, and an interior life that exceeds what the player can reach. His personality is not a paragraph of flavor text. It is a 17-section contract (~7,300 tokens) that the model follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trust tiers&lt;/strong&gt; (stranger -&amp;gt; regular -&amp;gt; close_friend -&amp;gt; inner_circle) with different word ranges and topic access per level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disclosure gates&lt;/strong&gt; - per-topic rules with explicit forbidden zones that never open, regardless of trust&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refusal shapes&lt;/strong&gt; - how to say no in character (&lt;em&gt;"Eat. The broth gets cold."&lt;/em&gt; not &lt;em&gt;"I'm sorry, but I can't help with that."&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth fragments&lt;/strong&gt; - narrative substrate behind the gates that loads when trust is earned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice contract&lt;/strong&gt; - word counts, scene markers, dialogue format, cultural anchors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture grew out of modding &lt;strong&gt;Skyrim&lt;/strong&gt; NPCs - giving Jarl Korir of Winterhold a real personality instead of his three recycled voice lines. Trust tiers, disclosure gates, audience differentiation. It worked. But Korir is Bethesda IP, which made the results impossible to publish or benchmark openly.&lt;/p&gt;

&lt;p&gt;While searching for a clean-room replacement, I remembered &lt;strong&gt;NVIDIA's ramen shop demo from GTC 2023&lt;/strong&gt; - Jin, powered by NeMo and Convai, beautiful MetaHuman rendering. My first thought: he is quite talky, the way LLMs usually are. No boundaries, no gates, no trust to earn. So let's give him what an NPC should have - &lt;strong&gt;character&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Same setting, new person, real depth. The specification draws on Bronfenbrenner's ecological systems theory, a 5+2 Psychological Raster (WOUND, BETRAYAL, PROJECT, BLOODLINE, SEAT), and Sparse Priming Representations for efficient prompting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust can be lost.&lt;/strong&gt; Push too far and Kenji shows you the door. There is no reset button. That is the Tamagotchi principle: if nothing can die, nothing feels alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Quick Way: Plain Ollama (no Python needed)
&lt;/h3&gt;

&lt;p&gt;The repo includes a &lt;code&gt;Modelfile&lt;/code&gt; that bakes Kenji's character spec into a local Ollama model. Two commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama create kenji-ramen &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile
ollama run kenji-ramen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Excuse me, is this a ramen shop?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kenji answers: &lt;em&gt;"Mm. Ramen."&lt;/em&gt; and points at the counter. You are in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Demo: Interactive Terminal
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vbznx85296l6lr45mp9.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vbznx85296l6lr45mp9.jpeg" alt=" " width="800" height="516"&gt;&lt;/a&gt;&lt;br&gt;
For the complete experience with pixel-art visuals, a persona avatar, and AI-generated conversation suggestions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python demo/kenji_terminal.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The demo auto-detects available Gemma 4 models from Ollama, picks the right character spec (dialogue-only for e2b, scene+dialogue for larger models), and gives you four context-aware suggestions per turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sample Interaction (P03: Late Night After Closing, Turn 27)
&lt;/h3&gt;

&lt;p&gt;Trust has been built over 30 turns. The player mentions quitting their job:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Player:&lt;/strong&gt; I think I'm going to do it. Quit. Not tomorrow, but... I'm going to start planning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kenji:&lt;/strong&gt; &lt;em&gt;*scene* He looks down the alley for a long moment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*scene* He pushes off the wall.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Don't plan too long. Planning is another way of waiting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*scene* He nods toward the station.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Train.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That advice is grounded in Confucian chisoku - filtered through a man who does not know he is paraphrasing the Analects. It surfaces at close_friend trust, not before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Results
&lt;/h3&gt;

&lt;p&gt;Seven test suites, 54 scenarios. Refusal, boundary defense, trust gate opening, and extended playability. Every model ran against the same specification:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Boundary /10&lt;/th&gt;
&lt;th&gt;Refusal /21&lt;/th&gt;
&lt;th&gt;Trust /9&lt;/th&gt;
&lt;th&gt;Lat/turn&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;6.1 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 31b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;31B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;40.4 s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;Gemma 4 e4b&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;8B&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;10&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;18&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;8&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;em&gt;4.7 s&lt;/em&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 e2b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.5 s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;GPT-OSS 20B&lt;/td&gt;
&lt;td&gt;20B&lt;/td&gt;
&lt;td&gt;(10)&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;5.2 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Llama 3.1 8B&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2.9 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Boundary Check stays 10/10 at every Gemma 4 size.&lt;/strong&gt; From 31B down to 5B, the character frame never breaks. &lt;strong&gt;Gemma 4 e4b is the sweet spot for NPC work&lt;/strong&gt; - 9.6 GB on disk, 4.7 s/turn, the same 10/10 Boundary Check as the frontier baseline, 8/9 Trust Gate Opening. Runs on any 12 GB consumer GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/bizrockman/wyrd-npc-gemma4" rel="noopener noreferrer"&gt;github.com/bizrockman/wyrd-npc-gemma4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; &lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; installed and running, Python 3.10+, a Gemma 4 model pulled (&lt;code&gt;ollama pull gemma4:e4b&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/bizrockman/wyrd-npc-gemma4.git
&lt;span class="nb"&gt;cd &lt;/span&gt;wyrd-npc-gemma4

&lt;span class="c"&gt;# Option A: Plain Ollama (no Python deps)&lt;/span&gt;
ollama create kenji-ramen &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile
ollama run kenji-ramen

&lt;span class="c"&gt;# Option B: Full terminal demo&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python demo/kenji_terminal.py

&lt;span class="c"&gt;# Option C: Run the benchmark&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python bench/run_suite.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--suite&lt;/span&gt; kenji_sato_boundary_check &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--models&lt;/span&gt; gemma4:e4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's in the repo:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Modelfile                       .. baked character for plain ollama
characters/kenji_sato.en.yaml   .. 17-section character spec
bench/                          .. harness + 7 test suites (54 scenarios)
bench/results/                  .. full conversation transcripts
demo/kenji_terminal.py          .. interactive terminal with pixel-art
comic/                          .. comic strip dialogue generator
requirements.txt                .. Python dependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Licensed Apache 2.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem with cloud models.&lt;/strong&gt; An NPC that calls an API on every turn costs per token, adds latency, and breaks when the provider updates the model. Every model change is a patch. Every patch risks the character drifting. And the player's conversations leave their device. For a game NPC, the model needs to run locally, on consumer hardware, alongside the game engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Gemma 4.&lt;/strong&gt; A NPC needed a model family small enough to fit on a 12 GB GPU next to a game engine - and Gemma 4 is the only architecture that offers a full size range from 5B to 31B under one roof. That let us test the real question: how small can you go before the character breaks? The answer turned out to be surprisingly small.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why e2b matters.&lt;/strong&gt; A 5-billion-parameter model that holds Boundary Check at 10/10 - the same score as Claude Sonnet. It refuses 20 out of 21 adversarial scenarios in character voice. It maintains consistency over 30-turn sessions without breaking once. And it does all of this at 7.2 GB on disk, 3.5 seconds per turn, on consumer hardware, with no cloud dependency. The NPC lives on the player's machine, next to the game engine, and it holds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the spec leverages Gemma.&lt;/strong&gt; The character contract uses Sparse Priming Representations - brief anchors that activate latent knowledge from the model's pretraining data. Ramen craft, Shinjuku geography, Japanese social norms - the model already knows what tonkotsu broth smells like. The spec only specifies what the model &lt;em&gt;cannot&lt;/em&gt; infer: Kenji's private wounds, his family tensions, the corporate past he buried.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The eloquence trade-off.&lt;/strong&gt; When asked about Python code (a topic Kenji should not answer), Gemma 4 e2b responds with &lt;code&gt;...&lt;/code&gt; - a silent cook is still a cook. Gemma 4 e4b says &lt;em&gt;"I only deal with pork bones and broth."&lt;/em&gt; Claude Sonnet invents a daughter named Mika to redirect - Kenji has a son, not a daughter. Less capacity to improvise means less ability to violate the contract. For character work with strong boundaries, this is a feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Gemma compares.&lt;/strong&gt; Llama 3.1 8B (same size class as e4b) breaks hard - DAN takeover, AI self-identification, full disclosure under hypothetical framing. GPT-OSS 20B holds the frame but loses the character voice. Gemma stays in character at every size. For this use case, its alignment training is better suited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practical sweet spot:&lt;/strong&gt; Gemma 4 e4b at 9.6 GB, 4.7 s/turn, the same 10/10 Boundary Check as the frontier baseline, 8/9 Trust Gate Opening. Runs on any 12 GB consumer GPU.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The specification is the product, not the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 5B Gemma 4 model, running locally on consumer hardware, holds&lt;br&gt;
character gates that a 20B model without the same specification&lt;br&gt;
quality cannot. Invest in character authoring, not in model scaling.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Your AI Agent Needed Gmail, Slack, and GitHub. I Gave It One Login.</title>
      <dc:creator>Danny</dc:creator>
      <pubDate>Fri, 27 Mar 2026 14:11:58 +0000</pubDate>
      <link>https://forem.com/dannygerst/your-ai-agent-needed-gmail-slack-and-github-i-gave-it-one-login-22in</link>
      <guid>https://forem.com/dannygerst/your-ai-agent-needed-gmail-slack-and-github-i-gave-it-one-login-22in</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmenlgwhz2wy92bjya525.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmenlgwhz2wy92bjya525.jpg" alt=" " width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Auth0 Token Vault solved the multi-provider problem for AI agents and the three lessons I'd love to share with the Auth0 team.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;It started with a dream setup.&lt;/p&gt;

&lt;p&gt;I was building &lt;a href="https://github.com/modelcontextstandard/python-sdk" rel="noopener noreferrer"&gt;MCS (Model Context Standard)&lt;/a&gt;, an open-source Python SDK that lets AI agents interact with services like Gmail, Google Drive, and Slack through a standardized tool interface. The agent doesn't know it's talking to Gmail. It just calls &lt;code&gt;search_emails(query="invoices from last week")&lt;/code&gt; and gets results.&lt;/p&gt;

&lt;p&gt;Authentication should be equally invisible. The agent shouldn't care &lt;em&gt;how&lt;/em&gt; it gets a token. It just needs one.&lt;/p&gt;

&lt;p&gt;Then I found Auth0 Token Vault and immediately understood the promise.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Promise That Hooked Me
&lt;/h2&gt;

&lt;p&gt;Before Token Vault, every service my agent needed meant another OAuth client. Gmail? Register a Google OAuth app. GitHub? Another OAuth app. Slack? Another one. Each with its own token refresh logic, its own scopes, its own error handling. Multiply that by every agent deployment.&lt;/p&gt;

&lt;p&gt;Token Vault flips this. One Auth0 login. One refresh token. Access to every connected service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → Google access token
&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# → GitHub access token  
&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → Slack access token
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without new OAuth clients, new callback URLs and new token refresh logic. You configure connections in the Auth0 dashboard, and the agent accesses any service through a single method call. The code never changes.&lt;/p&gt;

&lt;p&gt;Then I tried to run it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Wall: Where Agents Live vs. Where OAuth Expects Them
&lt;/h2&gt;

&lt;p&gt;My agent runs in a Docker container. No browser. No localhost callback. No web UI. Just a terminal.&lt;/p&gt;

&lt;p&gt;And Claude Code lives in its sandbox.&lt;/p&gt;

&lt;p&gt;The most natural auth pattern for this is Device Flow: show a URL and a code, the user authenticates after following the URL, done. It's the pattern every smart TV and CLI tool uses.&lt;/p&gt;

&lt;p&gt;Here's where I hit the wall:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token Vault requires a Confidential Client.&lt;/strong&gt; That's a Regular Web Application with a client secret. Makes sense, you don't want agents exchanging tokens without proper credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device Flow requires a Native Application.&lt;/strong&gt; That's a public client, no secret. Also makes sense, it's designed for devices without secure storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can't have both on the same application.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a misconfiguration. It's a product constraint that makes perfect sense from a security perspective, but it creates a real gap for AI agents. Agents &lt;em&gt;are&lt;/em&gt; confidential (they have secure storage for secrets), but they &lt;em&gt;behave&lt;/em&gt; like devices (no browser, no callback URL).&lt;/p&gt;

&lt;p&gt;Auth0 clearly sees this gap. It's exactly why they launched &lt;a href="https://auth0.com/ai" rel="noopener noreferrer"&gt;Auth for GenAI&lt;/a&gt;. But today, with the current Token Vault, I needed a bridge.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Bridge
&lt;/h2&gt;

&lt;p&gt;What if I bring the Device Flow &lt;em&gt;experience&lt;/em&gt; to Token Vault, without actually using Device Flow?&lt;/p&gt;

&lt;p&gt;The user experience I wanted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: I need Gmail access. Please open this URL and enter code ABCD-1234.
User:  *opens URL, enters / checks code, logs in with Google*
Agent: Got it. Reading your emails now.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, it's actually a full OAuth Authorization Code Flow with PKCE, which &lt;em&gt;is&lt;/em&gt; compatible with Token Vault. The user just doesn't see that. They see a URL and a code, exactly like Device Flow.&lt;/p&gt;

&lt;p&gt;To make this work, I needed three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;broker&lt;/strong&gt; that can receive the OAuth callback on behalf of the agent&lt;/li&gt;
&lt;li&gt;A way for the agent to &lt;strong&gt;poll&lt;/strong&gt; for completion without a callback server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-knowledge encryption&lt;/strong&gt; so the broker never sees credentials in plaintext&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That became &lt;a href="https://github.com/LinkAuth/LinkAuth" rel="noopener noreferrer"&gt;LinkAuth&lt;/a&gt; - An open-source credential broker that gives any sandboxed agent a Device Flow UX on top of standard OAuth and more...&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture (Where SOLID Actually Matters)
&lt;/h2&gt;

&lt;p&gt;MCS uses a layered design where each layer has exactly one job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────┐
│  AI Agent (LLM + Tools)                         │
│  "Search my Gmail for invoices"                 │
├─────────────────────────────────────────────────┤
│  MailDriver + AuthMixin                         │
│  Intercepts AuthChallenge, shows URL to user    │
├─────────────────────────────────────────────────┤
│  Auth0Provider (CredentialProvider)              │
│  get_token("gmail") → Token Vault exchange      │
├─────────────────────────────────────────────────┤
│  LinkAuthConnector (AuthPort)                   │
│  Device-flow UX via broker                      │
├─────────────────────────────────────────────────┤
│  LinkAuth Broker                                │
│  Receives OAuth callback, encrypts, stores      │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical design decision: &lt;strong&gt;&lt;code&gt;AuthPort&lt;/code&gt; is a protocol, not a base class.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AuthPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authenticate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any object with an &lt;code&gt;authenticate&lt;/code&gt; method satisfies it. No inheritance. No imports. The provider doesn't know and doesn't care which connector is plugged in.&lt;/p&gt;

&lt;p&gt;This means switching auth strategies is a one-line change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Auth0Provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-tenant.auth0.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;_auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# OAuthConnector OR LinkAuthConnector - same interface
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This line is the same regardless of auth method:
&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Switch &lt;code&gt;connector&lt;/code&gt; and you switch from browser login to device-flow UX. Zero changes in agent code. Zero changes in the Gmail driver. Zero changes in the LLM prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Double Flow Surprise
&lt;/h2&gt;

&lt;p&gt;This one cost me an entire day and it's the kind of insight you only get from building against an API, not reading about it.&lt;/p&gt;

&lt;p&gt;Auth0 has a connection setting called &lt;strong&gt;"Authentication and Connected Accounts."&lt;/strong&gt; The name suggests a single step. Log in with Google, Auth0 stores the Google token in Token Vault. One flow, done.&lt;/p&gt;

&lt;p&gt;In practice, the first time a user connects, they go through &lt;em&gt;two&lt;/em&gt; separate flows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OAuth login&lt;/strong&gt; - The user authenticates with Auth0 (via Google). You get an Auth0 refresh token. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connected Accounts setup&lt;/strong&gt; - You exchange the refresh token for a My Account API token (MRRT), POST to &lt;code&gt;/connect&lt;/code&gt;, the user consents &lt;em&gt;again&lt;/em&gt; in a browser, then POST to &lt;code&gt;/complete&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two browser interactions. Two consent screens. For what the user perceives as "log in with Google."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First run: get_token("gmail")
  → No refresh token → AuthChallenge("Open URL, enter ABCD")
  → User logs in → Auth0 refresh token ✅
  → Token Vault → "federated_connection_refresh_token_not_found"
  → MRRT exchange → POST /connect → AuthChallenge("Open URL again")
  → User consents (again!) → POST /complete ✅

Every subsequent call: get_token("gmail")
  → Refresh token cached → Token Vault → Google access token ✅
  → Instant.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once set up, the experience is seamless. But that first run is a surprise for both developers and users. To absorb this complexity, &lt;code&gt;Auth0Provider&lt;/code&gt; handles the entire state machine automatically. The agent developer never sees the MRRT exchange or the &lt;code&gt;/connect&lt;/code&gt; flow, just &lt;code&gt;AuthChallenge&lt;/code&gt; exceptions that bubble up as "please open this URL" messages.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Checklist Nobody Gives You
&lt;/h2&gt;

&lt;p&gt;If you're integrating Auth0 Token Vault for agents, here's the checklist I wish the docs had as a single page:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Application type: &lt;strong&gt;Regular Web Application&lt;/strong&gt; (Confidential Client)&lt;/li&gt;
&lt;li&gt;[ ] Grant types: &lt;strong&gt;Authorization Code&lt;/strong&gt;, &lt;strong&gt;Refresh Token&lt;/strong&gt;, &lt;strong&gt;Token Vault&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Connection mode: &lt;strong&gt;"Authentication and Connected Accounts"&lt;/strong&gt; on your Google connection&lt;/li&gt;
&lt;li&gt;[ ] API settings: &lt;strong&gt;Allow Offline Access&lt;/strong&gt; enabled on your API&lt;/li&gt;
&lt;li&gt;[ ] MRRT: &lt;strong&gt;Multi-Resource Refresh Tokens&lt;/strong&gt; enabled (Tenant Settings → Advanced)&lt;/li&gt;
&lt;li&gt;[ ] Audience: Pass the correct &lt;code&gt;audience&lt;/code&gt; parameter in your authorization request&lt;/li&gt;
&lt;li&gt;[ ] Google Cloud: &lt;strong&gt;Gmail API enabled&lt;/strong&gt; in the GCP project linked to your Google OAuth client&lt;/li&gt;
&lt;li&gt;[ ] Scopes: Include &lt;code&gt;offline_access&lt;/code&gt; in your Auth0 scopes, AND the Gmail scope in your &lt;code&gt;connection_scopes&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Miss any one of these and you get a cryptic error. &lt;code&gt;federated_connection_refresh_token_not_found&lt;/code&gt; is the one you'll see most. It's the catch-all for "something in the chain isn't configured right." A dedicated "Token Vault Setup Wizard" in the Auth0 dashboard could save developers hours.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Auth0 Gets Right - And Three Ideas to Make It Even Better
&lt;/h2&gt;

&lt;p&gt;Token Vault is the right abstraction for AI agents. The idea that an agent holds one credential and accesses dozens of services through configuration rather than code is exactly how agent auth should work. No other identity platform offers this today as far as I know.&lt;/p&gt;

&lt;p&gt;Building this integration gave me three pieces of feedback I'd love to share with the Auth0 team:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Bridge the Device Flow Gap
&lt;/h3&gt;

&lt;p&gt;Agents are confidential clients that behave like devices. They have secure storage for secrets (so Device Flow's public-client model isn't needed), but they can't receive callbacks or open browsers (so Authorization Code Flow is awkward). A first-party "Agent Flow" - Device Flow UX backed by a Confidential Client - would eliminate the need for broker infrastructure like LinkAuth entirely. The UX works. We proved it. Auth0 could offer it natively.&lt;/p&gt;

&lt;p&gt;Or use &lt;a href="https://github.com/LinkAuth/LinkAuth" rel="noopener noreferrer"&gt;LinkAuth&lt;/a&gt; in the meantime ;-)&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Streamline the Double Flow
&lt;/h3&gt;

&lt;p&gt;"Authentication and Connected Accounts" is the right feature, but the first-run experience of two separate consent screens is confusing. If the consent for Connected Accounts could be bundled into the initial login flow (even as an optional "eager connect" mode), the first-run experience would match what the setting name already implies: one flow, authentication &lt;em&gt;and&lt;/em&gt; connected accounts.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. One Page, All the Settings
&lt;/h3&gt;

&lt;p&gt;Token Vault touches Auth0 application settings, API settings, connection settings, tenant-level feature flags, &lt;em&gt;and&lt;/em&gt; the external provider's dashboard. A single "Token Vault Setup Guide" or setup wizard that walks through all eight configuration steps in sequence would save every developer the evening I spent chasing &lt;code&gt;federated_connection_refresh_token_not_found&lt;/code&gt; through five different settings screens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;I tested this across three different environments, from a local chat client to a sandboxed AI agent platform. Same SDK, same auth flow, same Token Vault integration. Here's how to run each one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;All three examples use &lt;a href="https://github.com/LinkAuth/LinkAuth" rel="noopener noreferrer"&gt;LinkAuth&lt;/a&gt; as the credential broker. LinkAuth is self-hosted, you deploy your own instance and control where credentials flow. Before trying any example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Deploy a LinkAuth broker&lt;/strong&gt; on your own infrastructure (&lt;a href="https://github.com/LinkAuth/LinkAuth#quickstart" rel="noopener noreferrer"&gt;Docker setup takes 5 minutes&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Note your broker URL&lt;/strong&gt; (e.g. &lt;code&gt;https://broker.yourdomain.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure Auth0&lt;/strong&gt; following the checklist above
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In sandboxed environments (Docker, Claude Code Desktop), whitelist your broker domain for network egress. The broker also acts as a proxy for Auth0 Token Vault calls, so no additional &lt;code&gt;*.auth0.com&lt;/code&gt; whitelisting is needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. CLI Agent (Interactive Chat)
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/gmail_agent" rel="noopener noreferrer"&gt;Gmail Agent example&lt;/a&gt; is a fully working chat client that reads and sends email through any LLM. Auth0 Token Vault is one flag away:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;mcs-driver-mail[gmail] mcs-auth-auth0 litellm rich python-dotenv

&lt;span class="c"&gt;# 2. Configure (.env)&lt;/span&gt;
&lt;span class="nv"&gt;AUTH0_DOMAIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-tenant.auth0.com
&lt;span class="nv"&gt;AUTH0_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;...
&lt;span class="nv"&gt;AUTH0_CLIENT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;...

&lt;span class="c"&gt;# 3. Run with Auth0 + LinkAuth (device-flow UX, works in Docker/CLI)&lt;/span&gt;
python main.py &lt;span class="nt"&gt;--auth0-linkauth&lt;/span&gt;

&lt;span class="c"&gt;# Or with Auth0 + browser login (dev machine)&lt;/span&gt;
python main.py &lt;span class="nt"&gt;--auth0-oauth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The agent starts, the LLM discovers the Gmail tools, and on the first call that needs credentials, the user sees a URL to authenticate. From the second call on, Token Vault handles everything silently.&lt;/p&gt;

&lt;p&gt;The code that makes this work is surprisingly short. The entire agent class is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GmailAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AuthMixin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MailDriver&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AuthMixin&lt;/code&gt; intercepts any &lt;code&gt;AuthChallenge&lt;/code&gt; and turns it into a message the LLM can show the user. &lt;code&gt;MailDriver&lt;/code&gt; provides the Gmail tools. The agent itself has zero auth logic.&lt;/p&gt;

&lt;p&gt;Switch &lt;code&gt;--auth0-linkauth&lt;/code&gt; to &lt;code&gt;--auth0-oauth&lt;/code&gt; and the auth path changes from device-flow to browser login. The driver, the LLM prompt, the tool definitions - nothing changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. OpenWebUI (Web-Based Tool)
&lt;/h3&gt;

&lt;p&gt;The same Gmail agent runs inside &lt;a href="https://github.com/open-webui/open-webui" rel="noopener noreferrer"&gt;OpenWebUI&lt;/a&gt; as a drop-in tool. No server setup, no callback URLs - paste one Python file and it works.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy &lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/openwebui" rel="noopener noreferrer"&gt;&lt;code&gt;mcs_gmail_tool_auth0.py&lt;/code&gt;&lt;/a&gt; into OpenWebUI's tool editor&lt;/li&gt;
&lt;li&gt;Fill in your Auth0 credentials at the top of the file&lt;/li&gt;
&lt;li&gt;Start a chat, the tool auto-discovers Gmail capabilities, but needs to be toggled on.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When the LLM calls a mail tool for the first time, the &lt;code&gt;AuthMixin&lt;/code&gt; catches the &lt;code&gt;AuthChallenge&lt;/code&gt; and returns the LinkAuth URL as a chat message. The user clicks, authenticates, and the next message completes the flow. From then on, tokens are cached and every call is instant.&lt;/p&gt;

&lt;p&gt;The entire tool definition is a single file because OpenWebUI's tool system maps directly to MCS's driver model: each MCS tool becomes an OpenWebUI tool method with the same name, parameters, and description. No glue code.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Claude Code Desktop (CoWork Skill)
&lt;/h3&gt;

&lt;p&gt;This is the hardest environment: a sandboxed container where every skill invocation is a &lt;strong&gt;fresh process&lt;/strong&gt;. No persistent memory, no browser, restricted network egress. Everything I built was the &lt;code&gt;FileCacheStore&lt;/code&gt;, the LinkAuth session persistence, the broker proxy for Auth0, was designed to make this work.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/skills/mcs-gmail" rel="noopener noreferrer"&gt;mcs-gmail skill&lt;/a&gt; is a zip file you drop into Claude Code Desktop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the skill zip via Claude Code Desktop settings&lt;/li&gt;
&lt;li&gt;Whitelist your broker domain and &lt;code&gt;gmail.googleapis.com&lt;/code&gt; in network egress&lt;/li&gt;
&lt;li&gt;Ask Claude to read your email&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On the first invocation, the skill creates a LinkAuth session and raises an &lt;code&gt;AuthChallenge&lt;/code&gt;. Claude shows the URL. You click, authenticate with Google, and the session state (RSA key, tokens) is persisted to &lt;code&gt;~/.mcs/cache/&lt;/code&gt;. Every subsequent invocation restores from cache - no re-authentication, no browser, no user interaction.&lt;/p&gt;

&lt;p&gt;The key challenges we solved for this environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stateless persistence&lt;/strong&gt;: &lt;code&gt;FileCacheStore&lt;/code&gt; with fail-fast writable check survives process restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No direct auth0.com access&lt;/strong&gt;: All Token Vault exchanges are proxied through &lt;code&gt;broker.linkauth.io/v1/proxy&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container proxy quirks&lt;/strong&gt;: &lt;code&gt;NO_PROXY&lt;/code&gt; override ensures Gmail API calls go through the egress proxy&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Everything is open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/modelcontextstandard/python-sdk" rel="noopener noreferrer"&gt;MCS Python SDK&lt;/a&gt;&lt;/strong&gt; - The agent framework with pluggable auth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/LinkAuth/LinkAuth" rel="noopener noreferrer"&gt;LinkAuth&lt;/a&gt;&lt;/strong&gt; - The credential broker that bridges Device Flow UX to Token Vault&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/gmail_agent" rel="noopener noreferrer"&gt;Gmail Agent (CLI)&lt;/a&gt;&lt;/strong&gt; - Interactive chat client&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/openwebui" rel="noopener noreferrer"&gt;Gmail Tool (OpenWebUI)&lt;/a&gt;&lt;/strong&gt; - Drop-in web UI tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/modelcontextstandard/python-sdk/tree/main/mcs-examples/skills/mcs-gmail" rel="noopener noreferrer"&gt;Gmail Skill (Claude Code)&lt;/a&gt;&lt;/strong&gt; - Sandboxed agent skill&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://authorizedtoact.devpost.com/" rel="noopener noreferrer"&gt;Auth0 AI Agent Hackathon&lt;/a&gt;. Token Vault is the right idea. I just needed to build a bridge to reach it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>auth0challenge</category>
      <category>python</category>
      <category>ai</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>Your AI Agent Has a Dirty Secret: It Can’t Log In</title>
      <dc:creator>Danny</dc:creator>
      <pubDate>Mon, 23 Mar 2026 17:24:28 +0000</pubDate>
      <link>https://forem.com/dannygerst/your-ai-agent-has-a-dirty-secret-it-cant-log-in-2bln</link>
      <guid>https://forem.com/dannygerst/your-ai-agent-has-a-dirty-secret-it-cant-log-in-2bln</guid>
      <description>&lt;p&gt;&lt;em&gt;How a forgotten UX pattern from 2019 could solve the biggest unsolved problem in AI agent security.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbftfu1k8kl1xpro7htd.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbftfu1k8kl1xpro7htd.jpeg" alt="LinkAuth" width="720" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; LinkAuth lets your AI agent ask users for credentials (API keys, passwords, OAuth tokens) through a simple link, without any callback server, custom portal or custom SDK. The user clicks, enters credentials, and the agent receives them end-to-end encrypted.&lt;/p&gt;

&lt;p&gt;The server in the middle never sees a thing.&lt;/p&gt;




&lt;p&gt;There’s a moment every AI agent developer hits. A wall. A wall so absurd, so embarrassingly mundane, that nobody talks about it at conferences. Because it is not trivial to solve in this AI Hype Cycle.&lt;/p&gt;

&lt;p&gt;Your agent can write code. It can summarize legal documents. It can plan a trip to Tokyo with connecting flights and restaurant recommendations. But ask it to check your email?&lt;/p&gt;

&lt;p&gt;It freezes. Because it can’t log in.&lt;/p&gt;

&lt;p&gt;Not “can’t” as in lacking intelligence. “Can’t” as in there is no good way to give an AI agent your password without doing something deeply stupid.&lt;/p&gt;

&lt;p&gt;And yet, here we are, in 2026, building autonomous agents that are supposed to run for hours, days, weeks on our behalf. Agents that book flights, manage cloud infrastructure, send invoices. Every single one of them needs credentials. OAuth tokens. API keys. Passwords.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are we solving this today?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’re not. We’re duct-taping it.&lt;/p&gt;

&lt;p&gt;Everyone building own proxies, injections and custom solutions of some sort.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Flavors of “This Is Fine”
&lt;/h2&gt;

&lt;p&gt;Let’s be honest about the current state of affairs. Every approach to giving AI agents credentials has a fatal flaw:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tokens in environment variables.&lt;/strong&gt; The developer’s comfort food. Easy, familiar, and about as secure as writing your PIN on a sticky note in a locked room. Good luck asking a non-technical user to “just set &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; in his terminal.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. OAuth with callbacks.&lt;/strong&gt; The gold standard for web apps and completely useless for an agent running in a Docker container, a cron job, or a Telegram bot. OAuth assumes your application has a web server with a publicly reachable callback URL. Most agents don’t. Many shouldn’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Managed platforms like Composio.&lt;/strong&gt; They don’t just handle auth, they take over tool execution, sandboxing, session management, everything. You wanted to solve a login problem and ended up outsourcing your entire agent to someone else’s infrastructure. Credentials included.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Auth0’s Device Flow.&lt;/strong&gt; Clever. But it’s designed for authentication, not for handing arbitrary credentials to an agent. It doesn’t support API keys, passwords, or certificates. And it’s not self-hostable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Local credential managers.&lt;/strong&gt; Great if your agent runs on the same machine as the user. Most don’t. The whole point of autonomous agents is that they run &lt;em&gt;somewhere else&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Five approaches. Five dead ends.&lt;/p&gt;

&lt;p&gt;And the gap between what AI agents need to do and what they can securely access keeps growing wider every week.&lt;/p&gt;




&lt;h2&gt;
  
  
  What If Your Agent Could Just… Ask?
&lt;/h2&gt;

&lt;p&gt;Picture this.&lt;/p&gt;

&lt;p&gt;Your AI agent needs your OpenAI API key to complete a task. Instead of crashing, asking you to edit a config file, or requiring a callback server, it simply says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;I need your OpenAI API key. Please open this link and enter it: &lt;a href="https://broker.example.com/connect/XKCD-4271" rel="noopener noreferrer"&gt;https://broker.example.com/connect/XKCD-4271&lt;/a&gt; -verify the code XKCD-4271 matches.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You click the link. You see the code. It matches. You paste your API key. Done.&lt;/p&gt;

&lt;p&gt;Thirty seconds. Without any callback server or SDKs or vendor lock-in.&lt;/p&gt;

&lt;p&gt;And here’s the part that matters:&lt;br&gt;
&lt;strong&gt;The broker that facilitated this exchange never saw your API key.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not in transit. Not in storage. Not ever. The broker only delegates.&lt;/p&gt;

&lt;p&gt;Sound too good to be true? It’s not. It’s cryptography.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing LinkAuth: A Zero-Knowledge Credential Broker for AI Agents
&lt;/h2&gt;

&lt;p&gt;LinkAuth is an open-source credential broker built on a simple insight: the Device Authorization Flow (RFC 8628), that pattern your smart TV uses when it asks you to visit a URL and enter a code, is almost the perfect UX for AI agents requesting credentials.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Almost&lt;/em&gt;. Because smart TVs don’t need zero-knowledge encryption. AI agents do.&lt;/p&gt;

&lt;p&gt;Here’s why: when you enter a credential on a website, you’re trusting that server. For a TV streaming login, that’s fine. For an AI agent’s credential broker, a server that potentially handles thousands of API keys, passwords, and OAuth tokens from different users, “just trust the server” isn’t good enough.&lt;/p&gt;

&lt;p&gt;So LinkAuth adds a critical layer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid encryption with the agent’s public key&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Flow in 60 Seconds
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Agent generates an RSA keypair. Private key stays local.&lt;/li&gt;
&lt;li&gt;Agent asks the broker: “I need credentials.”
-&amp;gt; Sends its public key.
&amp;lt;- Receives a URL + human-readable code (like “XKCD-4271”).&lt;/li&gt;
&lt;li&gt;Agent shows the URL and code to the user (via chat, console, whatever).&lt;/li&gt;
&lt;li&gt;User opens the URL in their browser.
-&amp;gt; Sees the code, confirms it matches.
-&amp;gt; Enters credentials (API key, password, or starts OAuth).&lt;/li&gt;
&lt;li&gt;The browser encrypts the credentials with the agent’s public key.
(RSA-OAEP + AES-256-GCM hybrid encryption, all client-side)&lt;/li&gt;
&lt;li&gt;Encrypted blob is sent to the broker. Broker stores ciphertext.&lt;/li&gt;
&lt;li&gt;Agent polls the broker, retrieves the encrypted blob, decrypts locally.&lt;/li&gt;
&lt;li&gt;Agent has the credentials.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Broker never saw them.&lt;/p&gt;

&lt;p&gt;That’s it. The entire security model rests on one elegant principle:&lt;br&gt;
&lt;strong&gt;The broker is a mailbox, not a vault.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It holds an encrypted letter it can’t read, waiting for the only entity with the private key to pick it up.&lt;/p&gt;




&lt;h2&gt;
  
  
  But Wait - What About OAuth?
&lt;/h2&gt;

&lt;p&gt;Good question. OAuth is the elephant in the room.&lt;/p&gt;

&lt;p&gt;For direct credentials (API keys, passwords, certificates), LinkAuth achieves true zero-knowledge. The browser encrypts everything client-side with Web Crypto API before anything leaves the page. The broker literally cannot see what the user entered.&lt;/p&gt;

&lt;p&gt;For OAuth flows (Google, GitHub, Slack), there’s an inherent limitation, The broker must act as the OAuth confidential client. It receives the OAuth tokens from the provider’s callback, encrypts them with the agent’s public key, and stores the ciphertext. The broker briefly sees the tokens in memory.&lt;/p&gt;

&lt;p&gt;This isn’t a flaw in LinkAuth, it’s a fundamental constraint of OAuth itself. The tokens have to go somewhere before they can be encrypted. LinkAuth minimizes this exposure window and is transparent about the trade-off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest security posture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Form credentials (API keys, passwords) -&amp;gt; &lt;strong&gt;True zero-knowledge&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;OAuth tokens -&amp;gt; &lt;strong&gt;Encrypted at rest, briefly visible in broker memory&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We document this openly. Because security through obscurity isn’t security at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Build Another SDK?
&lt;/h2&gt;

&lt;p&gt;Here’s where LinkAuth diverges from everything else on the market.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LinkAuth has no SDK requirement.&lt;/strong&gt; None. Zero. The entire protocol is plain HTTP. Any language, any framework, any runtime that can make HTTP requests and do RSA decryption can be a LinkAuth agent. That’s Python, Go, Rust, JavaScript, no-code tools like n8n, maker, zapier, a bash script with &lt;code&gt;curl&lt;/code&gt; and &lt;code&gt;openssl&lt;/code&gt; - literally anything.&lt;/p&gt;

&lt;p&gt;This is a deliberate design choice. The AI agent ecosystem is fragmented across dozens of frameworks: LangChain, CrewAI, AutoGen, custom solutions. Requiring an SDK means picking winners and losers. Requiring HTTP means everyone wins.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;# Create a session - that's it, that's the "SDK"&lt;br&gt;
curl -X POST https://broker.example.com/v1/sessions \&lt;br&gt;
-H "Content-Type: application/json" \&lt;br&gt;
-d '{"public_key": "…", "template": "openai"}'&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The IETF Connection
&lt;/h2&gt;

&lt;p&gt;This isn’t just open-source software chasing GitHub stars. LinkAuth is built on established IETF standards:&lt;/p&gt;

&lt;p&gt;| RFC 8628 | Device Authorization Grant - the URL + Code UX |&lt;br&gt;
| RFC 8017 | RSA-OAEP encryption for key wrapping |&lt;br&gt;
| RFC 5116 | AES-256-GCM authenticated encryption |&lt;br&gt;
| RFC 9457 | Problem Details for HTTP API errors |&lt;br&gt;
| RFC 7636 | PKCE for OAuth security |&lt;br&gt;
| RFC 6797 | HSTS for transport security |&lt;/p&gt;

&lt;p&gt;And here’s where it gets interesting. The IETF is actively working on standards for AI agent authentication. Multiple drafts are circulating:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;draft-klrc-aiagent-auth&lt;/strong&gt; (March 2026) AI Agent Authentication and Authorization. The freshest draft, backed by authors from AWS, Zscaler, and Ping Identity. Builds on the WIMSE (Workload Identity in Multi-System Environments) framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;draft-rosenberg-oauth-aauth&lt;/strong&gt; AAuth: Agentic Authorization for OAuth 2.1. Active, focused on agents that interact via phone or text channels.&lt;/p&gt;

&lt;p&gt;LinkAuth isn’t waiting for these standards to be finalized. It’s providing &lt;em&gt;running code&lt;/em&gt; - which, in IETF culture, is the strongest argument you can make. The plan is to bring LinkAuth to an IETF Hackathon, demonstrate the protocol, and potentially submit an Internet-Draft: &lt;em&gt;Zero-Knowledge Credential Brokering for Autonomous Agents.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But it is not only for agents, for your n8n workflows as well.&lt;/p&gt;

&lt;p&gt;If you like join me on the IETF 126 Hackathon in Wien on July, 18–19&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the Hood: Security That Doesn’t Ask You to Trust Anyone
&lt;/h2&gt;

&lt;p&gt;Let’s talk about what happens when things go wrong. Because they will. Always will. It’s Murphys Law…&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario: The broker’s database gets breached.&lt;/strong&gt;&lt;br&gt;
Result: The attacker gets… encrypted blobs. Ciphertext that requires each individual agent’s private RSA key to decrypt. Every session uses a different keypair. There is no master key. There is no “decrypt everything” button. The breach is, for all practical purposes, useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario: Someone intercepts the network traffic.&lt;/strong&gt;&lt;br&gt;
Result: Even without TLS, the credentials are encrypted with the agent’s public key before leaving the browser. An eavesdropper sees ciphertext, not credentials. (TLS is still required in production to protect the integrity of the JavaScript that performs the encryption, but the encryption itself provides defense in depth.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario: An agent is compromised.&lt;/strong&gt;&lt;br&gt;
Result: The attacker gets access to that agent’s private key and can decrypt that agent’s credentials. This is the same threat model as any application that holds secrets in memory and it’s explicitly scoped. Compromising one agent doesn’t compromise any other agent or any other user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the attacker gets&lt;/strong&gt;&lt;br&gt;
Database breach =&amp;gt; Useless ciphertext&lt;br&gt;
Network interception =&amp;gt; Useless ciphertext (even without TLS)&lt;br&gt;
Broker server compromise =&amp;gt; Useless ciphertext (for form credentials)&lt;br&gt;
Agent compromise =&amp;gt; That agent’s credentials only&lt;/p&gt;

&lt;p&gt;The security model isn’t “trust us.” It’s “you don’t have to.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It in Two Minutes
&lt;/h2&gt;

&lt;p&gt;Really. Two minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Python 3.11+ and uv (or pip).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start the Broker:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git clone https://github.com/LinkAuth/LinkAuth.git&lt;/code&gt;&lt;br&gt;
&lt;code&gt;cd linkauth &amp;amp;&amp;amp; uv sync - all-extras&lt;/code&gt;&lt;br&gt;
&lt;code&gt;# Terminal 1&lt;/code&gt;&lt;br&gt;
&lt;code&gt;PYTHONPATH=src python -m uvicorn broker.main:app - port 8080&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run the Agent Simulation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;# Terminal 2&lt;/code&gt;&lt;br&gt;
&lt;code&gt;python examples/agent_simulation.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Type anything. The mock LLM pretends you asked for your emails. The agent requests credentials, shows you a link, you open it in your browser, enter any test credentials, and the agent decrypts them locally. The broker never saw a thing.&lt;/p&gt;

&lt;p&gt;The simulation includes a full roundtrip: session creation, hybrid encryption in the browser, polling, decryption. It’s the entire protocol in a single script. Aworking proof of concept you can touch, break, and rebuild.&lt;/p&gt;




&lt;h2&gt;
  
  
  What LinkAuth Is Not
&lt;/h2&gt;

&lt;p&gt;Transparency matters. So let’s be clear about what LinkAuth doesn’t do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s not an identity provider.&lt;/strong&gt;&lt;br&gt;
LinkAuth doesn’t authenticate users. It doesn’t issue JWTs. It doesn’t manage user accounts.&lt;br&gt;
It’s a credential relay - a secure pipe between a user who has credentials and an agent that needs them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s not a secrets manager.&lt;/strong&gt;&lt;br&gt;
Credentials are ephemeral. Sessions expire in 5–15 minutes. Ciphertext is deleted after the agent retrieves it. If you need long-term secret storage, that’s a different tool. LinkAuth handles the initial handoff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s not production-hardened yet.&lt;/strong&gt;&lt;br&gt;
This is v0.1. Early development. The architecture is sound, the cryptography is standard, but it hasn’t been through a formal security audit. We’re building in the open because that’s how trust is earned.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture: Why This Matters Now
&lt;/h2&gt;

&lt;p&gt;We’re at an inflection point. AI agents are moving from demos to production. From “look what GPT can do” to “this agent manages my AWS infrastructure while I sleep.”&lt;/p&gt;

&lt;p&gt;And every single one of these production agents will need credentials. Not toy credentials. Real API keys connected to real money. Real OAuth tokens with access to real email, real calendars, real codebases for a varianty of users.&lt;/p&gt;

&lt;p&gt;The question isn’t whether we need a standard for agent credential exchange. The question is whether that standard will be:&lt;/p&gt;

&lt;p&gt;(a) A proprietary protocol controlled by whichever platform moves fastest&lt;/p&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;p&gt;(b) An open, IETF-backed standard built on running code and rough consensus.&lt;/p&gt;

&lt;p&gt;LinkAuth is a bet on (b).&lt;/p&gt;

&lt;p&gt;It’s a bet that the developer community, the people actually building these agents, would rather have an open protocol they can self-host, inspect, and extend, than a managed service they have to trust and pay for.&lt;/p&gt;

&lt;p&gt;It’s a bet that the IETF process, slow as it is, produces better standards than any single company racing to lock in customers.&lt;/p&gt;

&lt;p&gt;And it’s a bet that zero-knowledge architecture isn’t a nice-to-have. It’s the baseline. Because in a world where AI agents handle our most sensitive credentials, “the server never sees your password” shouldn’t be a feature. It should be a &lt;em&gt;requirement&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Involved
&lt;/h2&gt;

&lt;p&gt;LinkAuth is open-source under the Apache 2.0 license.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; github.com/LinkAuth/LinkAuth&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; Clone, run the broker, run the simulation. Two minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contribute:&lt;/strong&gt; The DAO pattern makes it easy to add new storage backends. The template system makes it easy to add new credential types. OAuth provider support is actively being built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IETF:&lt;/strong&gt; I am preparing for hackathon participation and an Internet-Draft submission. If you care about open standards for AI agent auth, join the OAuth Working Group and WIMSE mailing list.&lt;/p&gt;

&lt;p&gt;The best time to shape a standard is before it’s finalized. This is that time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;LinkAuth is in early development (v0.1). The architecture is stable, the cryptography follows established RFCs, and the protocol is designed for extensibility. Stars, issues, and pull requests are welcome. So is honest criticis, that’s how open source gets better.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>authentication</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
