<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Google Developer Experts</title>
    <description>The latest articles on Forem by Google Developer Experts (@gde).</description>
    <link>https://forem.com/gde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F11939%2Fe3080d5b-ecde-42a8-b089-bafecc31fa97.png</url>
      <title>Forem: Google Developer Experts</title>
      <link>https://forem.com/gde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gde"/>
    <language>en</language>
    <item>
      <title>My Incredible Google Cloud Next 26 Experience as a GDE</title>
      <dc:creator>Darren "Dazbo" Lester</dc:creator>
      <pubDate>Tue, 28 Apr 2026 13:40:53 +0000</pubDate>
      <link>https://forem.com/gde/my-incredible-google-cloud-next-26-experience-as-a-gde-17mo</link>
      <guid>https://forem.com/gde/my-incredible-google-cloud-next-26-experience-as-a-gde-17mo</guid>
      <description>&lt;p&gt;Hello friends! I landed back in the UK this evening, after spending the last few days in Vegas for Google Cloud Next '26. This was my 4th in-person Next event, and my second time as a speaker.&lt;/p&gt;

&lt;p&gt;Those that follow my work on other platforms (because I haven't been on &lt;a href="https://dev.to"&gt;dev.to&lt;/a&gt; for long) know that I've been writing post-Google Next blogs for a few years now. I tend to report on the key updates, what I think of them, how they might impact the teams I'm working with, and share a bit about my experience.&lt;/p&gt;

&lt;p&gt;This time, I'm going to strip it back to the experience, for a couple of reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;After Next 26, loads of people are already writing blogs with keynote summaries. And they're all &lt;em&gt;so quick!&lt;/em&gt; There's no value in me adding another blog with the same content. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This was my first time attending and presenting as a Google Developer Expert.&lt;/strong&gt; And this made the experience very different to previous visits. It's been an amazing experience, and I'd like to share some of it with you. (Where else am I gonna share all these cool photos?)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, grab your coffee and your cat (or is that just a &lt;em&gt;me&lt;/em&gt; thing?) and let's go!&lt;/p&gt;

&lt;h1&gt;
  
  
  Contents
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
What is Google Cloud Next?

&lt;ul&gt;
&lt;li&gt;Next 26 Was the Biggest Event That Week, Right?&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

What's This Google Developer Expert Thing?

&lt;ul&gt;
&lt;li&gt;And One Perk of Being a GDE? Going to Next, Baby!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Corporate Note&lt;/li&gt;

&lt;li&gt;Sunday: Flying Out&lt;/li&gt;

&lt;li&gt;

Monday

&lt;ul&gt;
&lt;li&gt;At the Luxor&lt;/li&gt;
&lt;li&gt;A Sneaky Visit to the Pool&lt;/li&gt;
&lt;li&gt;GDE Happy Hour&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Tuesday: GDE Summit, Baby!&lt;/li&gt;

&lt;li&gt;

Wednesday

&lt;ul&gt;
&lt;li&gt;Breakfast with Jack&lt;/li&gt;
&lt;li&gt;The Opening Keynote&lt;/li&gt;
&lt;li&gt;The GDE and Certified Lounge&lt;/li&gt;
&lt;li&gt;Partner All-Stars Event&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Thursday

&lt;ul&gt;
&lt;li&gt;Breakfast with Richard&lt;/li&gt;
&lt;li&gt;The Developer Keynote&lt;/li&gt;
&lt;li&gt;Puppies!&lt;/li&gt;
&lt;li&gt;Recording Session in the GDE Studio Pod&lt;/li&gt;
&lt;li&gt;Weezer and Benson&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Friday

&lt;ul&gt;
&lt;li&gt;Morning Panic&lt;/li&gt;
&lt;li&gt;My Talk&lt;/li&gt;
&lt;li&gt;Poker&lt;/li&gt;
&lt;li&gt;Going Home&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Final Reflections&lt;/li&gt;

&lt;li&gt;Links&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  What is Google Cloud Next?
&lt;/h1&gt;

&lt;p&gt;For those who haven't experienced it, Google Cloud Next is the annual flagship event for all things Google Cloud and Google AI. It’s a massive three-day (or for some of us, four-day) takeover of the Las Vegas Strip, bringing together over 30,000 developers, IT leaders, and cloud enthusiasts. It's where Google drops its biggest product announcements, deep-dives into new technologies, products and capabilities, and generally sets the pace for the cloud industry for the coming year.&lt;/p&gt;

&lt;p&gt;Some of it can be watched virtually, with the Digital Pass. But most of the session content is only available to those who attend in person.&lt;/p&gt;

&lt;p&gt;Attendees occupy all of the hotels along the strip. But the conference itself takes place in the Mandalay Bay Resort and Convention Center, which is HUGE.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7w5bqwtof9rpvm1yk7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7w5bqwtof9rpvm1yk7n.png" alt="Left: Luxor, Right: Mandalay Bay" width="550" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next 26 Was the Biggest Event That Week, Right?
&lt;/h2&gt;

&lt;p&gt;Maybe not! When I arrived on Sunday night, Wrestlemania 42 was underway. Apparently, attendance for Wrestlemania over two nights was nearly 110,000!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvkhgvad1ql1ffhzaxam.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvkhgvad1ql1ffhzaxam.webp" alt="Wrestlemania 42" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  What's This Google Developer Expert Thing?
&lt;/h1&gt;

&lt;p&gt;I'm so glad you asked. The GDE program is an invite-only program for individuals who have proven themselves to be both experts, but also evangelists of the Google technology ecosystem. There are only about 1200 GDEs in the world, from over 90 countries. That's about 1 GDE for every 500 Google professionally certified individuals!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The evangelisation part is important.&lt;/strong&gt; It's not sufficient to be a Google technology expert. You also need a proven (and ongoing) track-record of informing, educating and advising in the &lt;em&gt;public&lt;/em&gt; Google ecosystem. This evangelising must extend beyond the scope of an individual employer. And Google does actually count the stats and verify your influence. 🫨&lt;/p&gt;

&lt;h2&gt;
  
  
  And One Perk of Being a GDE? Going to Next, Baby!
&lt;/h2&gt;

&lt;p&gt;That's right. Google invites their GDEs to attend Next. And they also offer a limited number of speaker slots to GDEs. This year there were around 20 such speaker slots, and my talk proposal was one of the lucky few to be accepted. Woop! More on that later.&lt;/p&gt;

&lt;p&gt;Also, for the avoidance of any doubt: there are many perks! But it requires a lot of continuous work. No free lunches here! You put in the work, and good things follow.&lt;/p&gt;

&lt;h1&gt;
  
  
  Corporate Note
&lt;/h1&gt;

&lt;p&gt;I work for EPAM, and EPAM are a proud Google Cloud partner, and a sponsor of Next 26.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzjbijdl5wzbvlamf3ob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzjbijdl5wzbvlamf3ob.png" alt="Sponsors shown on the Vegas Sphere" width="800" height="691"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;EPAM had a booth on the Expo floor, right in the middle of the action. And my fellow EPAMers presented a number of talks on the cool things they've been building with our clients. You can read about some of these things &lt;a href="https://www.epam.com/about/who-we-are/events/2026/explore-epam-at-google-cloud-next-26?utm_source=linkedin&amp;amp;utm_medium=social&amp;amp;utm_term=google-next-2026&amp;amp;utm_campaign=partner-google" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Oh, just one &lt;em&gt;tiny&lt;/em&gt; thing. Announced during Next: EPAM won &lt;strong&gt;Google Cloud Partner of the Year for the 4th year in a row.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc7697xbnn4f71l9mdry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc7697xbnn4f71l9mdry.png" alt="EPAM PotY" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just sayin'.&lt;/p&gt;

&lt;p&gt;But this is a personal blog, not a corporate one. And with that in mind, I won't say much more about EPAM here. You know, &lt;em&gt;these are my personal views, yadda, yadda!&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Sunday: Flying Out
&lt;/h1&gt;

&lt;p&gt;I'm flying out from London, UK. Take-off was around 1730 in the sun. We were in the air for about 10 hours. Did some work. Landed in Vegas a little after 20:00. Sun is just setting. This is a weird experience, leaving me with undeniable jetlag which didn't shift for the whole week!&lt;/p&gt;

&lt;p&gt;This is Moulin. He's an excellent travelling companion. Very supportive (especially of my neck), but not too chatty.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhprhf0cbfvz5e07c9c96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhprhf0cbfvz5e07c9c96.png" alt="Moulin" width="609" height="809"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Monday
&lt;/h1&gt;

&lt;h2&gt;
  
  
  At the Luxor
&lt;/h2&gt;

&lt;p&gt;I was staying the iconic Luxor hotel. It's the one that looks like a pyramid:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihj63x6qr271o6edi9ls.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihj63x6qr271o6edi9ls.jpg" alt="Luxor" width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Staying in this hotel is &lt;em&gt;trippy&lt;/em&gt;. It's a hollow pyramid, and as you're walking along corridors, the strange angles induce a weird sense of vertigo. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujcx7cu1fl7hm9dcqe3n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujcx7cu1fl7hm9dcqe3n.jpg" alt="Trippy Luxor" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, the elevators - sorry, &lt;em&gt;inclinators&lt;/em&gt; - travel diagonally up and down the sides of the pyramid! I was in three separate &lt;em&gt;inclinator&lt;/em&gt; journeys where some hotel newbie would say &lt;em&gt;"What's wrong with this elevator?"&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oya1ihnt28g3wra98sa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oya1ihnt28g3wra98sa.png" alt="Are we going to die?" width="628" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And I'd explain the situation, and let them know they're not in immediate peril!&lt;/p&gt;

&lt;p&gt;But it turns out &lt;em&gt;I was&lt;/em&gt; in immediate peril. When I got to my room, THERE WAS NO COFFEE. I'm in tech. I need coffee. Fortunately, there's a Starbucks downstairs. They're the real winners here.&lt;/p&gt;

&lt;p&gt;The Luxor is located next door to the Mandalay. Here's my view of the Mandalay from my room:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcpa0l4dl3236ln09tiq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcpa0l4dl3236ln09tiq.png" alt="Mandalay Bay from my room" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And you can go directly from one to the other by walking through the casinos. Actually, this is pretty much a standard rule in Vegas. You can get EVERYWHERE by going through the casinos. It's possible to go a long time without seeing daylight.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Sneaky Visit to the Pool
&lt;/h2&gt;

&lt;p&gt;I decided to try and get an hour of sun before the shizzle gets rizzle. Whilst scouting out the pool, I spotted my first fellow GDE. Like &lt;em&gt;every&lt;/em&gt; GDE I met in Vegas, I'd only previously met John Capobianco virtually. But even as a distant stranger, this guy is unmistakeable. Pleasure to meet you John! I failed to grab a selfie with John at this point, but we were destined to meet many times.&lt;/p&gt;

&lt;p&gt;By the way, this is me at the pool. I know what you're thinking: &lt;em&gt;"Wow, those are hunky legs. Are you part of Wrestlemania?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm6z6f6wqm52b5u8zwe6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqm6z6f6wqm52b5u8zwe6.png" alt="Hunky legs" width="609" height="809"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GDE Happy Hour
&lt;/h2&gt;

&lt;p&gt;1700 was the start of the Global GDE Happy Hour. And I'm super excited to meet folk. The room was packed full of GDEs and Googlers, but I was feeling a little intimidated. Clearly a lot of people knew each other, and I was feeling a bit like a 5th wheel.&lt;/p&gt;

&lt;p&gt;But I was rescued nearly immediately. At this point, two Googlers turned up who are my friends, and who I've been looking forward to meeting for ages: Jack Wotherspoon and Luke Schlangen. They immediately started introducing me to other people, and I started to feel at home. (Side note: Luke knows EVERYONE.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g3n85mbaqtbev8d1e2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g3n85mbaqtbev8d1e2f.png" alt="Meeting the posse" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From left to right: me, &lt;a class="mentioned-user" href="https://dev.to/jackwoth"&gt;@jackwoth&lt;/a&gt;, Andrew Brogdon, &lt;a class="mentioned-user" href="https://dev.to/lukeschlangen"&gt;@lukeschlangen&lt;/a&gt;, Bryce Howitson, Martin Omander, Emma Twerksy, and... John Capobianco! (Two Dev Keynote speakers in one photo!)&lt;/p&gt;

&lt;p&gt;By the way, Jack and Luke: I want you to know how much I appreciate you guys. &lt;em&gt;In general&lt;/em&gt;, but also for hanging out with me that evening.&lt;/p&gt;

&lt;p&gt;Here's a couple more nice guys I met in the party: Giuliano Ribeiro and Azidin Shairi.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkdma5w5kmg5abjailjm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkdma5w5kmg5abjailjm.png" alt="Giuliano and Azidin" width="608" height="693"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And a couple of GDE powerhouses that I've known for a while and was also looking forward to connecting with: Mazlum Tosun and &lt;a class="mentioned-user" href="https://dev.to/xbill"&gt;@xbill&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;In the time it takes me to write this blog, Will will have published 50. He's a content-factory. (Will, somehow I totally failed to get a selfie with you. Next time...)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfiud37x34h0tv2vk7kf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfiud37x34h0tv2vk7kf.jpg" alt="Darren and Mazlum" width="800" height="855"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then I met Shishir Suresh, fellow GDE from the UK. Another super-nice, welcoming and smart guy. (More with Shishir later.)&lt;/p&gt;

&lt;p&gt;And I finally got to meet some of the folks that run the GDE program, and who make this whole experience possible: Ronan Mandel, Asrar Khan, and Alfredo Morresi. Guys, we owe you so much. (Selfies later in the blog!)&lt;/p&gt;

&lt;h1&gt;
  
  
  Tuesday: GDE Summit, Baby!
&lt;/h1&gt;

&lt;p&gt;Tuesday was the Google Developer Experts Summit, at the Luxor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvqrz0n19nsbnb2s7jwp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvqrz0n19nsbnb2s7jwp.jpg" alt="GDE Summit 2026" width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This was my first ever GDE summit. The day was packed with talks from some Google titans, followed by a number of round tables. It was a sneak peek of things to come, and an opportunity for no-holds-barred Q&amp;amp;A. The content is all under NDA, so I'm not going break any rules.&lt;/p&gt;

&lt;p&gt;But a quick glimpse at some of our speakers and sessions...&lt;/p&gt;

&lt;p&gt;A welcome from Ron and Asrar:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhb0zjhs85mgm60blak3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhb0zjhs85mgm60blak3.png" alt="Ron and Asrar" width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then some insights about the program. Alfredo and Natalie described the geographic distribution of GDEs, the mix of expertise, and the types of contributions we've made in the last 12 months.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxlm4rs28nxihwbcklf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxlm4rs28nxihwbcklf.png" alt="Alfredo and Natalie" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ooh look - that's me!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda3nbf87uf97yqhgss8i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda3nbf87uf97yqhgss8i.png" alt="Built by... Me!" width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, an amazing hands-on session from Richard Seroter. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What, THE Richard Seroter?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yes, the very same."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmcmcx289uxp0itokp5n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmcmcx289uxp0itokp5n.jpg" alt="Richard Seroter presenting to GDEs" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(This guy is my hero. I might mention this in passing as the blog continues...)&lt;/p&gt;

&lt;p&gt;This talk was a masterclass in how to do inspiring (and amusing) live demos. The demo included Gemini Enterprise, Stitch integrated with MCP, App Design Centre (which supports SO MANY MORE services than it did a few months ago), App Hub, the new BigQuery Agent for talking to your data, and so much more.&lt;/p&gt;

&lt;p&gt;For me, this session reinforced the message that we're in a world where &lt;strong&gt;everyone can be a builder&lt;/strong&gt; with AI. Some folks choose to resist that, for whatever reasons. But we have the option to embrace it, and help builders (who might be our clients) to use AI the right way, and get the most out of the platform. &lt;/p&gt;

&lt;p&gt;These were some of the key nuggets. Or rather, my interpretation of those nuggets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Planning remains crucial&lt;/strong&gt;, but AI can certain help us with that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use cases are rarely unique&lt;/strong&gt;, even across verticals. So we need to share more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The lines between disciplines are dissolving with agents&lt;/strong&gt;. So many folks over the last few days have mentioned how they're terrible at frontend dev (including Richard). But that just doesn't matter anymore.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software dev tools are evolving&lt;/strong&gt;. IDEs are evolving from being code-centric to being agent centric. (Aja talked quite a bit about this in the next session.) Now our tools are using multi-modal models, MCP and skills, so we can integrate with everything and we can "see" everything.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;"Builders" are now managers of agents.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context engineering is absolutely paramount&lt;/strong&gt;, as we all knew it would be. And Google's platform now allows for sharing of context between services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boring reports should be a thing of the past&lt;/strong&gt;. There's no excuse to not make compelling material / documentation / business cases, whatever, when we all have access to models like Nano Banana and Veo 3, and the ability to use Gemini in Google Workspace.&lt;/li&gt;
&lt;li&gt;Researching has always been one of the things I enjoy most about what I do. I guess it's partly a legacy from that biochem degree, a lifetime ago. But now, &lt;strong&gt;everyone can do detailed research in a highly compressed timeline&lt;/strong&gt;, using agents like Gemini Deep Research and tools like NotebookLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Everyone can prototype, and we shouldn't block this&lt;/strong&gt;. (&lt;em&gt;Enterprises - I'm looking at you!&lt;/em&gt;) Embrace the new-found enthusiasm of our new ecosystem of builders. And then, when we're ready to take this to production, Google's got you covered.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Train your builders&lt;/strong&gt;! Certainly in my current client, I've been spending time educating teams on how to leverage agentic tools effectively, regardless of your role.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we had sessions from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scott Densmore and Aja Hammerly, discussing the evolving landscape of generative AI coding tools.&lt;/li&gt;
&lt;li&gt;Logan Kilpatrick, Michael Gerstenhaber, and David Mclaughlin, discussing the partnership of Google Deepmind and Gemini Enterprise Agent Platform (the  artist formerly known as Vertex AI)&lt;/li&gt;
&lt;li&gt;A panel discussion combining Cloud DevRel, AI DevRel and GDM DevRel, with Karl Weinmeister, Aja, Dave Elliott, Omar Sanseviero, and Matt Thompson.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Wednesday
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Breakfast with Jack
&lt;/h2&gt;

&lt;p&gt;Early start! At 0700 I had breakfast with the legend that is Jack Wotherspoon.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Not THE Jack Wotherspoon?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yeah, that one."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"The Gemini CLI Jack Wotherspoon?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yeah, the same."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"The Dev Keynote Jack Wotherspoon?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yep."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Pictures or it didn't happen."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Dammit, forgot to take a breakfast selfie.&lt;/p&gt;

&lt;p&gt;Jack and I have been buddies for a while now. But it was great to just chat, and make the most of the EPAM VIP breakfast at the Seabreeze Café in the Mandalay.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Opening Keynote
&lt;/h2&gt;

&lt;p&gt;The keynotes are held in the Michelob ULTRA Arena, inside the Mandalay.  It seats 10,000 people. It sounds like a lot, but it's not enough! Only about a third of people manage to get seated in the arena, and you have to queue early. My top tip for my colleages: for the 0900 Main Keynote, make sure you're in the queue by 0800.&lt;/p&gt;

&lt;p&gt;Fortunately, GDEs get a fastlane and reserved seating. Check out this welcome!&lt;/p&gt;


&lt;div&gt;
    &lt;iframe src="https://www.youtube.com/embed/F-Ktusl7PAw"&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;


&lt;p&gt;The arena starts filling up. Time for a couple of fun photos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ugzx8eqw78d4h8q3sul.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ugzx8eqw78d4h8q3sul.jpg" alt="Audience selfie" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now let me introduce Jesse Nowlin:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefht7fl4vjsk6l9j012k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefht7fl4vjsk6l9j012k.png" alt="Jesse" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;He's a fellow GDE, and he's also a famous content creator. He happens to be the authority on Google Next and always shares an extremely helpful Know Before You Go, like this one:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/pqNG4FgLHYo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;As we're waiting for the keynote to begin, we were treated to an awesome duo of DJ + dynamic music visualisation using hand gestures to render the code in real time! (If someone knows their names, please tell me!)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rru6tfgqkt0e952a6a3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rru6tfgqkt0e952a6a3.png" alt="DJ and visuals" width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The keynote then formally begins with a very cool video, created using Nano Banana 2, Veo 3.1 and Genie 3. Check out the &lt;a href="https://cloud.google.com/transform/gen-ai-creativity-video-editing-next-26-weezer-garage?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google blog&lt;/a&gt; on how this was done.&lt;/p&gt;

&lt;p&gt;And then, as always, Thomas Kurian took to the stage to talk us through some of the big announcements. I said I wouldn't do a proper keynote recap. So I'll keep it short! You can check out the whole &lt;a href="https://www.youtube.com/watch?v=11PBno-cJ1g" rel="noopener noreferrer"&gt;Google Cloud Next '26 - Opening Keynote on YouTube&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;The era of the pilot is over. The era of the agent is here.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let me give just a few key points...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The main theme of the keynote was that we're out of the agentic "proof-of-concept" era, into a world where &lt;strong&gt;everyone works with agents, and everyone is a builder.&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Now we need to &lt;strong&gt;ensure our agents are production-ready&lt;/strong&gt;, that they can scale, that we have seamless observability across our agents and their stack, and unified context, no matter where we are.&lt;/li&gt;
&lt;li&gt;Gemini Enterprise agents were central for flight readiness and astronaut safety for &lt;strong&gt;Artemis II&lt;/strong&gt;.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkf5b7zwuoc0h4xvzadtc.jpg" alt="Google Enterprise agents and Artemis II" width="800" height="463"&gt;
&lt;/li&gt;
&lt;li&gt;Google continues to be the only Cloud provider that offers a &lt;strong&gt;full stack for AI and agentic solutions&lt;/strong&gt;, and a number of big innovations were announced across all layers in this integrated stack.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmr1enn3oqqwvgvu0r8ky.png" alt="Google Agent End-to-End Stack" width="800" height="412"&gt;
&lt;/li&gt;
&lt;li&gt;Mention of some of the &lt;strong&gt;latest models&lt;/strong&gt;, which are not quite new. E.g. Gemini 3.1 Flash Image (aka Nano Banana 2), Veo 3.1 Lite, and Lyria 3 Pro.&lt;/li&gt;
&lt;li&gt;The Vertex AI Platform has been rebranded - with evolution, of course - as &lt;strong&gt;Gemini Enterprise Agent Platform&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The new Gemini Enterprise Agent Platform is split across four pillars: &lt;strong&gt;Build, Scale, Govern, and Optimize&lt;/strong&gt;.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspldipunux0f5wgxf6iz.png" alt="Gemini Enterprise Agent Platform: Build, Scale, Govern, Optimize" width="800" height="602"&gt;
&lt;/li&gt;
&lt;li&gt;The Agent Platform now integrates capabilities like: &lt;strong&gt;low-code Agent Studio, &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/agent-registry?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Registry&lt;/a&gt;, and Skills &amp;amp; Tools Registry.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;We now have &lt;strong&gt;off-the-shelf agent skills for many Google services and Workspace.&lt;/strong&gt; Nice! &lt;a href="https://github.com/google/skills/tree/main/skills/cloud?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Check them out&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;And &lt;strong&gt;managed, &lt;a href="https://docs.cloud.google.com/mcp/overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;remote MCP servers&lt;/a&gt; for all GCP services&lt;/strong&gt; are now available in GA.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ADK now supports graph-based deterministic multi-agent orchestration.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/agent-identity-overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Identity&lt;/a&gt;&lt;/strong&gt;, such that every agent has a unique, cryptographic ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/gateways/agent-gateway-overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Gateway&lt;/a&gt;&lt;/strong&gt;, a centralised command centre for agentic policy enforcement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;End-to-end multi-agent observability.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;AI Hypercomputer: 

&lt;ul&gt;
&lt;li&gt;The big announcement here was the &lt;strong&gt;&lt;a href="https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;8th Gen TPUs&lt;/a&gt;&lt;/strong&gt;, which are not split across two specialised platforms: TPU 8t for training, and TPU 8i for inference. Lots of numbers dumped here, and mentions of advancements like "inter-chip interconnect".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud &lt;a href="https://docs.cloud.google.com/compute/docs/general-purpose-machines#n4a_series?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Axion N4A&lt;/a&gt;&lt;/strong&gt; - now offering 2x price-performance vs comparable x86&lt;/li&gt;
&lt;li&gt;Support for latest &lt;strong&gt;NVIDIA GPUs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The new &lt;strong&gt;Virgo network&lt;/strong&gt;, which doubles the bandwidth between chip pods in a cluster.&lt;/li&gt;
&lt;li&gt;And up to 1 million TPUs in a single training cluster!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Knowledge Catalog&lt;/a&gt;&lt;/strong&gt; is essentially the evolution of Dataplex. It allows for universal metadata management and context, and supports both structured and unstructured data sources, as well as third party sources.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Okay, you get the idea. You can check out &lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2026-wrap-up" rel="noopener noreferrer"&gt;260 Things We Accounced at Google Cloud Next '26&lt;/a&gt; to deep-dive into any of these topics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GDE and Certified Lounge
&lt;/h2&gt;

&lt;p&gt;THIS. This was my favourite thing about Next. It was my oasis. My safe haven. My place of calm. My spiritual home during Next 26.&lt;/p&gt;

&lt;p&gt;Look it's John Capobianco, again!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37ekhfiupy5itd7v29c0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37ekhfiupy5itd7v29c0.png" alt=" " width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we had coffee; we had power; we had the ongoing live streams. And most importantly, ALL THE AMAZING PEOPLE! As a result, I spent way more time in this space than I was expecting. I had so many sessions booked in my calendar. But I ended up skipping several in order to spend more time in this Lounge, interacting with with Googlers and fellow GDEs. (And, of course, panicking for my own sessions and planned activities.)&lt;/p&gt;

&lt;p&gt;These are some of the first people I met:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxp13ixjnnc855sctr6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxp13ixjnnc855sctr6a.png" alt="Meeting in the GDE Lounge" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cristine Souza was organising the GDE Studio Recording sessions here at Next, and my recording session was planned for tomorrow (Thursday). More on that later. But TL;DR - what an absolute pleasure to work with.&lt;/p&gt;

&lt;p&gt;Then we have Lisa Carpenter, expert in ML and lead for GDG Bletchley. Thanks to Lisa for such a warm welcome, and: I've joined Bletchley!&lt;/p&gt;

&lt;p&gt;And Xavier Portilla Edo. Our paths had not yet crossed. But I know they will in future. This guy is a GDE mentor, podcaster, course creator, author... (Man, I feel old.)&lt;/p&gt;

&lt;p&gt;Then I got a chance to take a selfie with Shishir. We only met a couple of days ago, but I feel like we've been friends for a lifetime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb3wr6n0wolf5j3in7pi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb3wr6n0wolf5j3in7pi.png" alt="Selfie with Shishir" width="605" height="806"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jesse, who let me try out his cool Meta sunglasses!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb06s45xnl20qbgo734es.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb06s45xnl20qbgo734es.png" alt="Jesse" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And Abdel, who actually interviewed me for the GDE program!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kdca8w7oy1qtk89fbs8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kdca8w7oy1qtk89fbs8.png" alt="Abdel" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(At this point, if I fail to mention how friendly / welcoming / smart someone is, let's just take it as given!)&lt;/p&gt;

&lt;h2&gt;
  
  
  Partner All-Stars Event
&lt;/h2&gt;

&lt;p&gt;A few months ago I was fortunate to be &lt;a href="https://www.linkedin.com/posts/darren-lester-architect_partnerallstars-share-7396303535836909568-T5TC?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAAdyZYBCFbC7dtAq1VP22uMahNafUlaMb0" rel="noopener noreferrer"&gt;named a Google Cloud Partner All-Star in Delivery Excellence&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cty496lvf1svzf0ehq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cty496lvf1svzf0ehq8.png" alt="Partner All-Star" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And so, I got invited to the All-Star event at Next. But, thanks to a stupid migraine attack in the evening, I was scatter-brained and totally missed it. With any luck, there will be a next time...&lt;/p&gt;

&lt;h1&gt;
  
  
  Thursday
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Breakfast with Richard
&lt;/h2&gt;

&lt;p&gt;Early start. At 0700 I had breakfast with the legend that is Richard Seroter.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Not THE Richard Seroter?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yeah, that one."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"The Google Cloud Chief Evangelist?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yeah, the same."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"The Dev Keynote Richard Seroter?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Yep."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Pictures or it didn't happen."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DAMMIT Dazbo! Why do you keep forgetting to take selfies!&lt;/p&gt;

&lt;p&gt;I may have mentioned I'm a fan. Richard has supported me for a while now, and was the person who invited me to the GDE program. So I owe him a lot! We've chatted quite a bit over the last year or so, but never met face-to-face. The fact that he gave up time to chat to me, just a couple hours before the big keynote, is a little astonishing. But he's such a decent guy. &lt;/p&gt;

&lt;p&gt;We're in the Seabreeze, and we talked about stuff like: model lifecycles, demand for TPUs, and how Google is well-positioned in the AI long game, given the integrated stack.&lt;/p&gt;

&lt;p&gt;And I learned that the keynote speakers get the full Hair, Wardrobe &amp;amp; Makeup treatment. Their outfits are selected for them!&lt;/p&gt;

&lt;p&gt;As we parted ways, the unimagineable happened. He said: &lt;em&gt;"I'm a big fan."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This Next is just getting better and better.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Developer Keynote
&lt;/h2&gt;

&lt;p&gt;This is the session many have been waiting for; certainly the GDEs. It's the Developer Keynote - where the rubber meets the road.&lt;/p&gt;

&lt;p&gt;GDEs once again were escorted to reserved seating, right in front of the stage. I was in row 3! Check out the arena behind me:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi3a8o0das3dg3f6zqdzn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi3a8o0das3dg3f6zqdzn.png" alt="GDEs at the front" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This was an amazing keynote, where Richard and Emma guided us through the overall journey of building a marathon planning application. And with each section, they brought in Googlers to live demo a particular part of the journey. Again, I'm not going to share details about this. You can check out the Dev Keynote here: &lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/A01DQ8_xy7Q?start=7"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;I'll just share a few snaps...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3b1051ye15sm51razyh3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3b1051ye15sm51razyh3.png" alt="Agent Registry" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Richard and Emma:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fja6injysjbja9n2sseou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fja6injysjbja9n2sseou.png" alt="Richard and Emma" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Look, Jack's back! (Watch the video - Jack has a camel.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezt59dhu4vpyvw5phomi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezt59dhu4vpyvw5phomi.png" alt="Jack's back" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's Megan:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud4cbyrm0jhuuivgi02z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud4cbyrm0jhuuivgi02z.png" alt="Megan" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jason and Ines: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhh9xiy3s84dadno0nu09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhh9xiy3s84dadno0nu09.png" alt="Jason and Ines" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Puppies!
&lt;/h2&gt;

&lt;p&gt;No Next experience blog would be complete without at least one picture of the puppies on the Expo floor:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal2jsvf1vkvqnztsrnnk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal2jsvf1vkvqnztsrnnk.png" alt="Puppies!" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wish I could sleep as easily as these guys!&lt;/p&gt;

&lt;h2&gt;
  
  
  Recording Session in the GDE Studio Pod
&lt;/h2&gt;

&lt;p&gt;I've been incredibly excited about this. A little while back I submitted a proposal to record a fireside chat at Next, in the dedicated GDE Studio Pod. Then I forgot all about it. (Standard.) &lt;/p&gt;

&lt;p&gt;A couple of weeks ago I received confirmation that the proposal had been accepted. Woop!&lt;/p&gt;

&lt;p&gt;So, on Thursday afternoon, just after the Dev Keynote, I hopped into the pod with my buddies Jack (yes - that Jack), and Romin Irani. I've known Romin for a fairly long time. He is a Google Developer Advocate, and one of the managers of the Google Cloud Medium publication. Submitting content to that publication is how we originally met. Since then we've collaborated often and I'm honoured to call him my friend.&lt;/p&gt;

&lt;p&gt;In the fireside, I refer to both these guys as Google legends. Google already sets the bar quite high on this front. It's rare I meet a Googler I don't like. But these guys definitely deserve "Legend" status.&lt;/p&gt;

&lt;p&gt;The chat is about Gemini CLI, agents, MCP, skills, from three different perspectives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From Jack, Google DevRel, in the AI Tools Team&lt;/li&gt;
&lt;li&gt;From Romin, Google Developer Advocate, as someone who has a broad view of adoption through the publication&lt;/li&gt;
&lt;li&gt;From myself, from the perspective of what it's like getting these sorts of tools into enterprises&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check it out! And please don't forget to hit the Like button and subscribe. I don't do a lot of YT content, but if you guys like it, I'll try and do more.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/cHlmuL3jwj8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Weezer and Benson
&lt;/h2&gt;

&lt;p&gt;At Google Next, there's always a big concert for attendees on the Thursday night. Last year it was The Killers. This year, it was Benson Boone, supported by Weezer.&lt;/p&gt;

&lt;p&gt;So look, Google, I don't mean to tell you your business. But for an audience that is arguaby &lt;em&gt;mostly&lt;/em&gt; male geeks, do you think Benson was the right choice? &lt;em&gt;Was he though?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Of course, I'm just kidding. However, my wife - a Benson superfan - was not happy that I got to see Benson, whilst she didn't. There's definitely a glitch in the Matrix.&lt;/p&gt;

&lt;p&gt;Try as I might to not like him (mainly for the benefit of my wife), there's no doubt that the guy can backflip off a piano without skipping a beat:&lt;/p&gt;


&lt;div&gt;
    &lt;iframe src="https://www.youtube.com/embed/t9WIp89gs_0"&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;


&lt;h1&gt;
  
  
  Friday
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Morning Panic
&lt;/h2&gt;

&lt;p&gt;For me, this was the big event. I was doing a lightning talk in the Developer Theatre, called: &lt;strong&gt;Automating the UI with Gemini CLI, MCP  and Skills&lt;/strong&gt;. Just a couple of hours before it was due to start, I was at 815 registrations. Eeeek!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp5z91bt8865tixuwvei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp5z91bt8865tixuwvei.png" alt="815 registrations for my talk" width="572" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I sat in the GDE and Certified Lounge and did a couple of panicked run-throughs. Whilst there, I managed to get a couple more selfies...&lt;/p&gt;

&lt;p&gt;This is Ronan, Lord of Google Cloud GDEs. (I just made up that title. It's &lt;em&gt;probably not&lt;/em&gt; his actual title.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu7z0nnqsyl5uhlyu8gp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu7z0nnqsyl5uhlyu8gp.png" alt="Ron" width="609" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Matthew Eckstrom, it was a delight to meet you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9rczbtseh56jzfnet1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9rczbtseh56jzfnet1r.png" alt="Matthew Eckstrom" width="611" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now, a life-changing moment. I stood next to Asrar Khan, and thought... &lt;em&gt;"Is he taking the p1ss?"&lt;/em&gt; He stood exactly as I was standing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo0kb94etgt3op0flm5tg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo0kb94etgt3op0flm5tg.png" alt="Asrar mimicking" width="611" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And then I had a bit of a lesson on how to pose for a camera. How have I gone my whole life without getting this advice? I NEVER look good in a photo. But just maybe half of the problem is how I've been standing.&lt;/p&gt;

&lt;p&gt;Here's the post-Asrar-advice version:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5qkxesg55xhmb7jkhiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5qkxesg55xhmb7jkhiu.png" alt="Better pose" width="611" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OMG, that's so much better! This might genuinely be the most important thing I've learned at Next 26.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Talk
&lt;/h2&gt;

&lt;p&gt;I did my customary &lt;em&gt;three visits to the toilet&lt;/em&gt; that precede any talks I do. Then I was backstage, getting mic'd up. And then... Showtime!&lt;/p&gt;

&lt;p&gt;I asked my friends to help me crowd-source some photos. They didn't disappoint! Here's just a few:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov641whq8ci3xjbohed4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov641whq8ci3xjbohed4.png" alt="Intro slide" width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnnjcmbhffk0nxm8ztyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnnjcmbhffk0nxm8ztyy.png" alt="Intro - distant" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l7dkliyzcpqrgu5t04x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l7dkliyzcpqrgu5t04x.png" alt="Darren Lester" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz4xa0z9k4ju3n1566dn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz4xa0z9k4ju3n1566dn.png" alt="About Me" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjkwx97q6zhq3wmcglbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjkwx97q6zhq3wmcglbd.png" alt="Side view" width="609" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0op8pfp2pqofcqwx3a2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0op8pfp2pqofcqwx3a2.png" alt="Live Demo" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And finally, &lt;strong&gt;I've ALWAYS wanted to do this. The stage selfie shot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8yv0fiptr07ghfrvdnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8yv0fiptr07ghfrvdnz.png" alt="Stage Selfie" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That was a blast! I think it went well. I didn't quite leave enough time to do Q&amp;amp;A. (Maybe that was by design!)&lt;/p&gt;

&lt;p&gt;Quick shout-out: Mandeep, thanks for coming to say hello. It was a pleasure to meet you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnaa7basn7zlhu5gopx19.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnaa7basn7zlhu5gopx19.png" alt="Mandeep" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Poker
&lt;/h2&gt;

&lt;p&gt;With the adrenaline still pumping from the talk, I had one last Vegas 'to-do' list item before heading to the airport. It wouldn't be a Vegas trip without at least some time spent in the casino. So I decided to play a couple of hours of poker before my late flight. &lt;/p&gt;

&lt;p&gt;My buddy Marius and I found a table in the Mandalay. We both started out with $100 of chips. Over the next couple of hours I was up and down, but found myself with roughly what I'd started with. The girl to my left was dominating the table, and had probably about $1500 at this point.&lt;/p&gt;

&lt;p&gt;I figured I was going to depart in the next couple of hands, one way or another. I've got a flight to get to! So, here's how my last hand played out, for any poker fans out there...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Blinds are $1 and $2.&lt;/li&gt;
&lt;li&gt;I'm dealt pocket threes under the gun.&lt;/li&gt;
&lt;li&gt;I probably wouldn't normally play it, but I knew I wanted to leave soon. I raise to $6.&lt;/li&gt;
&lt;li&gt;Girl on my left re-raises to $15.&lt;/li&gt;
&lt;li&gt;Everyone folds except BB, who calls.&lt;/li&gt;
&lt;li&gt;Now it's back to me. Again, I possibly wouldn't normally call, but I figured, there's over $30 in the pot, I'm already in for $6, and if I happen to hit, they'll never see it coming. So I call.&lt;/li&gt;
&lt;li&gt;Flop comes down: AK3. &lt;strong&gt;Oh yeah!&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;I bet $6. She raises to $15. BB calls. All exactly as expected.&lt;/li&gt;
&lt;li&gt;I go All-In for about $70.&lt;/li&gt;
&lt;li&gt;She calls. BB folds.&lt;/li&gt;
&lt;li&gt;We turn the cards. She's got one K, but gets no help from the turn and river.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then I go back to my room, with roughly twice what I started with. Yay!&lt;/p&gt;

&lt;p&gt;I know it's not the biggest hand you've ever seen. But it was a fun way to end the day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Home
&lt;/h2&gt;

&lt;p&gt;Finally, I'm about to board for the 10 hour return flight. It's a little before 2300 and I have the pleasure of meeting Shishir at the gate, alongside my EPAM buddy Suds. Apologies to these guys - I kept dozing off whilst talking to them. I was SO TIRED!&lt;/p&gt;

&lt;p&gt;Finally, I'm on the plane. And one of the nicest things happens. A guy who I've never met stops as he walks past my seat. He shakes my hand and tells me my talk was &lt;em&gt;inspiring&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;What a way to finish the Vegas experience.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final Reflections
&lt;/h1&gt;

&lt;p&gt;It sounds a bit cliché to say "it's all about the people". But in this case, it definitely was. I loved hanging out with Googlers and other GDEs. There's so many folks I'd spoken to remotely, but never met IRL. And I met so many unexpected new friends. I feel like part of the family now.&lt;/p&gt;

&lt;p&gt;Well, I think I've written enough. That's a wrap!&lt;/p&gt;

&lt;h1&gt;
  
  
  Links
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=11PBno-cJ1g&amp;amp;utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google Cloud Next '26 - Opening Keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=A01DQ8_xy7Q&amp;amp;?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google Cloud Next '26 - Developer Keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=JemyjTlOvy0&amp;amp;utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Next '26 Developer Keynote Deep-Dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2026-wrap-up?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;260 things we announced at Google Cloud Next '26 – a recap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=cHlmuL3jwj8" rel="noopener noreferrer"&gt;Google Cloud Next '26 Fireside: Gemini CLI from Three Worlds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/skills/tree/main/skills/cloud?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/mcp/overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google Cloud MCP Servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/agent-registry?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/agent-identity-overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Identity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/gateways/agent-gateway-overview?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Gateway&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;8th Gen TPUs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/compute/docs/general-purpose-machines#n4a_series?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Axion N4A&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog?utm_campaign=DEVECO_GDEMembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Knowledge Catalog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>googlecloud</category>
      <category>googlecloudnext</category>
      <category>gde</category>
      <category>vegas</category>
    </item>
    <item>
      <title>Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:48:30 +0000</pubDate>
      <link>https://forem.com/gde/self-hosted-gemma-4-on-tpu-with-mcp-adk-and-gemini-cli-2j9d</link>
      <guid>https://forem.com/gde/self-hosted-gemma-4-on-tpu-with-mcp-adk-and-gemini-cli-2j9d</guid>
      <description>&lt;p&gt;This article provides a step by step deployment guide for Gemma 4 to v6e Trillium TPUs in an 8 core 2x4 setup. A suite of Python MCP tools is built to simplify management of the vLLM hosted Gemma 4 deployment with Gemini CLI. Finally- this deployed Gemma model is used as the main LLM from sample ADK agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq9cqvgg4jl2naddbnio.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq9cqvgg4jl2naddbnio.jpeg" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What is this project trying to Do?
&lt;/h4&gt;

&lt;p&gt;This project is a DevOps/SRE assistant that uses a Gemma 4 model self-hosted on Google Cloud TPUs to analyze infrastructure issues. It provides tools to provision the TPU and deploy the model, as well as for observability and performance testing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Just the facts, ma’am
&lt;/h4&gt;

&lt;p&gt;This project includes an active GCP infrastructure (V6E TPU, 2x4 topology, created 2026–04–26), compute (gemma4-vllm-stack-node, internal IP 10.128.0.18, external IP 35.193.100.125), and online vLLM inference with google/gemma-4–31B-it. Optimization standards include BF16/FP8 precision, TP=8 parallelism, 16384 max sequence length, and Flax/JAX (OpenXLA) engine.&lt;/p&gt;

&lt;h4&gt;
  
  
  AI-Driven Troubleshooting
&lt;/h4&gt;

&lt;p&gt;The agent connects to Google Cloud Logging to identify errors in your environment. It uses a self-hosted vLLM inference server to summarize logs.&lt;/p&gt;

&lt;h4&gt;
  
  
  Infrastructure Automation
&lt;/h4&gt;

&lt;p&gt;The project includes a full “Inference Stack” manager. It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy Gemma with vLLM : Automatically generate and execute commands to deploy a 4x2 Trillium v6e&lt;/li&gt;
&lt;li&gt;Generate Configs: Create Kubernetes (GKE) manifests for running vLLM on TPU v6e chips.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
 ▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node Version Management
&lt;/h4&gt;

&lt;p&gt;Gemini CLI needs a consistent, up to date version of Node. The &lt;strong&gt;nvm&lt;/strong&gt; command can be used to get a standard Node environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nvm-sh/nvm" rel="noopener noreferrer"&gt;GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python MCP Documentation
&lt;/h4&gt;

&lt;p&gt;The official GitHub Repo provides samples and documentation for getting started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for starting MCP development for vLLM management is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.&lt;/p&gt;

&lt;p&gt;Then, a minimal Python MCP Server is built with stdio transport. This server is validated with Gemini CLI in the local environment.&lt;/p&gt;

&lt;p&gt;This setup validates the connection from Gemini CLI to the local server via MCP. The MCP client (Gemini CLI) and the Python MCP server both run in the same local environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Python environment and a working Gemini CLI installation. The next step is to clone the GitHub samples repository with support scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~
git clone https://github.com/xbill9/aisprintapr2026
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;tpu-vllm-devops-agent
&lt;span class="nb"&gt;source &lt;/span&gt;init.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;tpu-vllm-devops-agent
&lt;span class="nb"&gt;source &lt;/span&gt;set_env.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;h4&gt;
  
  
  vLLM Management Tool with MCP Stdio Transport
&lt;/h4&gt;

&lt;p&gt;One of the key features that the standard MCP libraries provide is abstracting various transport methods.&lt;/p&gt;

&lt;p&gt;The high level MCP tool implementation is the same no matter what low level transport channel/method that the MCP Client uses to connect to a MCP Server.&lt;/p&gt;

&lt;p&gt;The simplest transport that the SDK supports is the stdio (stdio/stdout) transport — which connects a locally running process. Both the MCP client and MCP Server must be running in the same environment.&lt;/p&gt;

&lt;p&gt;The connection over stdio will look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize FastMCP server
&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Self-Hosted vLLM DevOps Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Running the Python Code
&lt;/h4&gt;

&lt;p&gt;First- switch the directory with the Python version of the MCP sample code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/aisprintapr2026/tpu-vllm-devops-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the release version on the local system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;span class="go"&gt;Processing ./.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project can also be linted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make lint
&lt;span class="go"&gt;ruff check .
All checks passed!
ruff format --check .
5 files already formatted
mypy .
Success: no issues found in 5 source files
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a test run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;span class="go"&gt;python test_agent.py
&lt;/span&gt;&lt;span class="c"&gt;......
&lt;/span&gt;&lt;span class="go"&gt;----------------------------------------------------------------------
Ran 6 tests in 0.055s

OK
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  vLLM Interaction with MCP stdio Transport
&lt;/h4&gt;

&lt;p&gt;One of the key features that the MCP protocol provides is abstracting various transport methods.&lt;/p&gt;

&lt;p&gt;The high level tool MCP implementation is the same no matter what low level transport channel/method that the MCP Client uses to connect to a MCP Server.&lt;/p&gt;

&lt;p&gt;The simplest transport that the SDK supports is the stdio (stdio/stdout) transport — which connects a locally running process. Both the MCP client and MCP Server must be running in the same environment.&lt;/p&gt;

&lt;p&gt;In this project Gemini CLI is used as the MCP client to interact with the Python MCP server code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI settings.json
&lt;/h4&gt;

&lt;p&gt;Replace the default Gemini CLI configuration file —  &lt;strong&gt;settings.json&lt;/strong&gt; with a pre-configured sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vllm-tpu-agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/home/xbill/aisprintapr2026/tpu-vllm-devops-agent/server.py"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"GOOGLE_CLOUD_PROJECT"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aisprint-491218"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MODEL_NAME"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google/gemma-4-31B-it"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Validation with Gemini CLI
&lt;/h4&gt;

&lt;p&gt;The final connection test uses Gemini CLI as a MCP client with the Python code providing the MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
 ▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade

&lt;/span&gt;&lt;span class="gp"&gt; &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/mcp list
&lt;span class="go"&gt;Configured MCP servers:

ent
  - mcp_vllm-tpu-agent_verify_model_health
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Getting Started with Gemma 4 on TPU
&lt;/h4&gt;

&lt;p&gt;The GitHub Repo provinces a starter recipe:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/inference/trillium/vLLM/Gemma4" rel="noopener noreferrer"&gt;tpu-recipes/inference/trillium/vLLM/Gemma4 at main · AI-Hypercomputer/tpu-recipes&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Official vLLM repo also has Gemma4 specific information:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/vllm-project/vllm/releases/tag/v0.19.1" rel="noopener noreferrer"&gt;Release v0.19.1 · vllm-project/vllm&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  TPU Deployment
&lt;/h4&gt;

&lt;p&gt;A lower cost entry point for TPU deployment is Queued Resources. This approach allows TPU reservations to be requested in real-time and provides an easy path to allocate TPU — with some additional complexity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.cloud.google.com/tpu/docs/queued-resources" rel="noopener noreferrer"&gt;Manage queued resources | Cloud TPU | Google Cloud Documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are a few options to deploy vLLM on TPU- but the simplest is to use the generated docker instance:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/AI-Hypercomputer/tpu-recipes/blob/main/inference/trillium/vLLM/Gemma4/docker-compose-gemma4-31B.yml" rel="noopener noreferrer"&gt;tpu-recipes/inference/trillium/vLLM/Gemma4/docker-compose-gemma4-31B.yml at main · AI-Hypercomputer/tpu-recipes&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  vLLM Lifecycle Management via MCP
&lt;/h4&gt;

&lt;p&gt;The MCP tools provide a complete suite of agent-oriented operations for managing vLLM deployment on Cloud Run or a TPU.&lt;/p&gt;

&lt;p&gt;Overview of MCP tools :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🟢 vllm-tpu-agent - Ready (21 tools)
  Tools:
  - mcp_vllm-tpu-agent_check_tpu_availability
  - mcp_vllm-tpu-agent_describe_queued_resource
  - mcp_vllm-tpu-agent_destroy_queued_resource
  - mcp_vllm-tpu-agent_estimate_deployment_cost
  - mcp_vllm-tpu-agent_get_cloud_logging_logs
  - mcp_vllm-tpu-agent_get_deployed_endpoint
  - mcp_vllm-tpu-agent_get_model_details
  - mcp_vllm-tpu-agent_get_reservation_status
  - mcp_vllm-tpu-agent_get_system_status
  - mcp_vllm-tpu-agent_get_tpu_system_logs
  - mcp_vllm-tpu-agent_get_vllm_deployment_config
  - mcp_vllm-tpu-agent_get_vllm_docker_logs
  - mcp_vllm-tpu-agent_get_vllm_endpoint
  - mcp_vllm-tpu-agent_list_queued_resources
  - mcp_vllm-tpu-agent_manage_queued_resource
  - mcp_vllm-tpu-agent_manage_vllm_docker
  - mcp_vllm-tpu-agent_query_queued_gemma4
  - mcp_vllm-tpu-agent_query_queued_gemma4_with_stats
  - mcp_vllm-tpu-agent_run_vllm_benchmark
  - mcp_vllm-tpu-agent_save_hf_token
  - mcp_vllm-tpu-agent_verify_model_health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Request Queued TPU Resources
&lt;/h4&gt;

&lt;p&gt;First — use the MCP tool to request TPU resources:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fub9efq49buwyp9jo7pga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fub9efq49buwyp9jo7pga.png" width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The status can be checked on the TPU status page on the Google Cloud Console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mdn2cbv81t2gz8huf3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mdn2cbv81t2gz8huf3d.png" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;with detailed information:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzkw8pnq3dle8nlrcx5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzkw8pnq3dle8nlrcx5r.png" width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  vLLM Deployment to Docker
&lt;/h4&gt;

&lt;p&gt;Once the TPU has been activated- the MCP tools can check the Gemma deployment status:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nznbjt32d60rkvtb1ef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nznbjt32d60rkvtb1ef.png" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then the status can be checked:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F648guhf3iu960oj8dpv4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F648guhf3iu960oj8dpv4.png" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And active:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv6sf2wvbwrht3ryme0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv6sf2wvbwrht3ryme0s.png" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Checking vLLM status
&lt;/h4&gt;

&lt;p&gt;The status can be checked with an MCP tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; get_system_status

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ get_system_status (vllm-tpu-agent MCP Server) {} │
│ │
│ 🌀 System Status (southamerica-east1-c) │
│ - vLLM Health: 🟢 Online (http://34.39.230.66:8000) │
│ 📋 Queued Resources in southamerica-east1-c │
│ 1 NAME STATE NODE_ID ACCELERATOR_TYPE CREATE_TIME │
│ 2 vllm-gemma4-qr ACTIVE 2026-04-27T16:35:53.683050645Z │
│ 👉 Next Step: Use query_queued_gemma4 to interact with the model. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ I've received the system status. The vLLM service is online and a queued resource is active.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Query the Model with Stats
&lt;/h4&gt;

&lt;p&gt;Once everything is online — you can send queries to the Model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; queued_gemma4_with_stats the battle of hastings

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ query_queued_gemma4_with_stats (vllm-tpu-agent MCP Server) {"prompt":"the battle of hastings"} │
│ │
│ 📊 Performance Stats │
│ - Time to First Token (TTFT): 0.764s │
│ - Total Generation Time: 15.709s │
│ - Tokens per Second: 68.19 tokens/s │
│ - Total Tokens (approx.): 1019 │
│ │
│ 💬 Model Response │
│ The Battle of Hastings was one of the most consequential events in European history. Fought on October 14, 1066, it marked the beginning │
│ of the Norman Conquest of England, fundamentally altering the English language, culture, and political structure. │
│ │
│ Here is a detailed breakdown of the battle: │
│ │
│ --- │
│ │
│ 1. The Cause: A Crisis of Succession │
│ The battle was the result of a power struggle following the death of Edward the Confessor, the King of England, who died childless in │
│ January 1066. Three main contenders claimed the throne: │
│ │
│ * Harold Godwinson: The most powerful earl in England and Edward’s brother-in-law. He was crowned king immediately after Edward's death. │
│ * William, Duke of Normandy: A distant cousin of Edward. He claimed that Edward had promised him the throne years earlier and that Harold │
│ had sworn an oath to support him. │
│ * Harald Hardrada: The King of Norway, who believed he had a hereditary right to the throne through his ancestors. │
│ │
│ 2. The Prelude: A Year of Two Invasions │
│ Before facing William, King Harold had to deal with the Vikings. In September 1066, Harald Hardrada invaded Northern England. Harold │
│ Godwinson marched his army 185 miles in just a few days and defeated the Vikings at the Battle of Stamford Bridge. │
│ │
│ However, while Harold was celebrating in the north, the wind changed in the English Channel. William of Normandy landed his invasion force │
│ at Pevensey on the south coast. Harold had to march his exhausted army all the way back down south to meet the new threat. │
│ │
│ 3. The Battle (October 14, 1066) │
│ The two armies met at Senlac Hill, about seven miles from Hastings. │
│ │
│ The Opposing Forces: │
│ * The English (Anglo-Saxons): Comprised mostly of infantry. Their primary tactic was the "Shield Wall"—soldiers stood │
│ shoulder-to-shoulder with overlapping shields, creating an almost impenetrable barrier of wood and steel. │
│ * The Normans: A more diverse army consisting of archers, infantry, and—most crucially—heavy cavalry (knights on horseback). │
│ │
│ The Course of the Fight: │
│ * The Stalemate: For most of the day, the English shield wall held firm atop the ridge. The Norman archers fired upward, but their arrows │
│ mostly flew over the English heads or bounced off the shields. The Norman cavalry charges failed to break the line. │
│ * The Turning Point (The Feigned Retreat): Seeing that the shield wall was unbreakable, William ordered his troops to pretend to panic │
│ and retreat. Some English soldiers, thinking they had won, broke formation and chased the Normans down the hill. Once the English were │
│ out in the open and disorganized, the Norman cavalry wheeled around and slaughtered them. │
│ * The End: By late afternoon, the English line was thinned and exhausted. King Harold was killed—traditionally believed to have been hit │
│ in the eye with an arrow, though some sources suggest he was hacked to death by Norman knights. │
│ │
│ 4. The Outcome and Aftermath │
│ With the death of King Harold and much of the English nobility, organized resistance collapsed. │
│ │
│ * Coronation: William marched to London and was crowned King of England on Christmas Day, 1066, at Westminster Abbey. He became known as │
│ William the Conqueror. │
│ * The Feudal System: William replaced the English nobility with Norman lords, introducing a rigid feudal system. │
│ * Language: French became the language of the court, administration, and law for centuries. This is why Modern English is a hybrid of Old │
│ English (Germanic) and French (Latinate). │
│ * Architecture: To secure his hold on the land, William built massive stone castles across England, including the Tower of London. │
│ │
│ 5. Historical Legacy: The Bayeux Tapestry │
│ Much of what we know about the battle comes from the Bayeux Tapestry. This 70-meter-long embroidered cloth depicts the events leading up │
│ to the battle and the battle itself. While created from a Norman perspective (and thus biased), it remains one of the most important │
│ visual records of the Middle Ages. │
│ │
│ Summary Table │
│ ┌────────────┬─────────────────────────────┬────────────────────────────────┐ │
│ │ Feature │ Anglo-Saxons │ Normans │ │
│ ├────────────┼─────────────────────────────┼────────────────────────────────┤ │
│ │ Leader │ King Harold Godwinson │ William, Duke of Normandy │ │
│ │ Key Tactic │ Shield Wall │ Cavalry &amp;amp; Archers │ │
│ │ Terrain │ Defensive (Top of the hill) │ Offensive (Bottom of the hill) │ │
│ │ Outcome │ Defeat / Death of King │ Victory / Conquest of England │ │
│ └────────────┴─────────────────────────────┴────────────────────────────────┘ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ The model has provided a detailed response about the Battle of Hastings, including performance statistics for the query.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Deploying MCP Server to Cloud Run
&lt;/h4&gt;

&lt;p&gt;Once the local stdio MCP server has been tested- it can be deployed to Google Cloud Run.&lt;/p&gt;

&lt;p&gt;First switch to the MCP server directory and deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent/mcp-https-python$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make deploy
&lt;span class="go"&gt;Submitting build to Google Cloud Build...
Creating temporary archive of 13 file(s) totalling 36.7 KiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/xbill/.config/gcloud/logs/2026.04.27/20.26.23.215852.log] to see which files and the contents of the
&lt;/span&gt;&lt;span class="gp"&gt;default gcloudignore file used (see `$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gcloud topic gcloudignore&lt;span class="sb"&gt;`&lt;/span&gt; to learn
&lt;span class="go"&gt;more).

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the build- the MCP server will be on a well known endpoint. The settings.json will have an entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gemma4"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mcp-https-python-wgcq55zbfq-rj.a.run.app/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP tools are available from the Cloud Run service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt; &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/mcp list
&lt;span class="go"&gt;Configured MCP servers:

🟢 gemma4 - Ready (20 tools)
  Tools:
  - mcp_gemma4_check_tpu_availability
  - mcp_gemma4_describe_queued_resource
  - mcp_gemma4_destroy_queued_resource
  - mcp_gemma4_estimate_deployment_cost
  - mcp_gemma4_get_cloud_logging_logs
  - mcp_gemma4_get_deployed_endpoint
  - mcp_gemma4_get_model_details
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test the MCP Tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; get model details
✦ I will retrieve the detailed information for the currently running Gemma 4 model and vLLM engine.

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Action Required │
│ │
│ ? get_model_details (gemma4 MCP Server) {} │
│ │
│ MCP Server: gemma4 │
│ Tool: get_model_details │
│ │
│ MCP Tool Details: │
│ (press Ctrl+O to expand MCP tool details) │
│ Allow execution of MCP tool "get_model_details" from server "gemma4"? │
│ │
│ 1. Allow once │
│ 2. Allow tool for this session │
│ ● 3. Allow all server tools for this session │
│ 4. Allow tool for all future sessions ~/.gemini/policies/auto-saved.toml │
│ 5. No, suggest changes (esc) │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ I will retrieve the detailed information for the currently running Gemma 4 model and vLLM engine.

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ get_model_details (gemma4 MCP Server) {} │
│ │
│ 🧩 Model &amp;amp; vLLM Engine Details (http://34.39.243.194:8000) │
│ │
│ Model Information (/v1/models): │
│ 1 { │
│ 2 "object": "list", │
│ 3 "data": [ │
│ 4 { │
│ 5 "id": "google/gemma-4-31B-it", │
│ 6 "object": "model", │
│ 7 "created": 1777336641, │
│ 8 "owned_by": "vllm", │
│ 9 "root": "google/gemma-4-31B-it", │
│ 10 "parent": null, │
│ 11 "max_model_len": 16384, │
│ 12 "permission": [ │
│ 13 { │
│ 14 "id": "modelperm-aaefbddc518584d8", │
│ 15 "object": "model_permission", │
│ 16 "created": 1777336641, │
│ 17 "allow_create_engine": false, │
│ 18 "allow_sampling": true, │
│ 19 "allow_logprobs": true, │
│ 20 "allow_search_indices": false, │
│ 21 "allow_view": true, │
│ 22 "allow_fine_tuning": false, │
│ 23 "organization": "*", │
│ 24 "group": null, │
│ 25 "is_blocking": false │
│ 26 } │
│ 27 ] │
│ 28 } │
│ 29 ] │
│ 30 } │
│ vLLM Version (/version): │
│ - Version: 0.19.2rc1.dev215+g32e45636e │
│ │
│ Health Status (/health): │
│ - Status: Healthy ✅ │
│ │
│ Key vLLM Metrics (/metrics): │
│ 1 # HELP process_resident_memory_bytes Resident memory size in bytes. │
│ 2 # TYPE process_resident_memory_bytes gauge │
│ 3 process_resident_memory_bytes 2.460672e+09 │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ The system is currently running Gemma 4 (31B-it) via vLLM version 0.19.2rc1. The model is healthy and configured with a maximum sequence length
  of 16,384 tokens.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  But Wait! There’s More!
&lt;/h4&gt;

&lt;p&gt;First — get the TPU deployed Gemma 4 URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
 ▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade

&lt;/span&gt;&lt;span class="gp"&gt; &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;get model url
&lt;span class="go"&gt;
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ get_vllm_endpoint (vllm-tpu-agent MCP Server) {} │
│ │
│ 🟢 vLLM is Online at: http://34.39.243.194:8000 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ I have found the vLLM endpoint. It is running at http://34.39.243.194:8000.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;save the endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MODEL_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://34.39.243.194:8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then test with the ADK CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent/agents$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run gemma4vllm
&lt;span class="go"&gt;/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/features/_feature_decorator.py:72: UserWarning: [EXPERIMENTAL] feature FeatureName.PLUGGABLE_AUTH is enabled.
  check_feature_enabled()
Log setup complete: /tmp/agents_log/agent.20260427_205922.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent vllm_agent, type exit to exit.
[user]: what do you do
20:59:32 - LiteLLM:INFO: utils.py:3995 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= google/gemma-4-31B-it;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; openai
&lt;span class="go"&gt;[vllm_agent]: * User asks: "what do you do"
    * Context: I am a helpful assistant running on a self-hosted vLLM endpoint. My internal name is "vllm_agent".

    * I am a Large Language Model (LLM).
    * I process text inputs and generate text outputs.
    * I can perform various tasks: answering questions, writing code, summarizing text, creative writing, analysis, etc.
    * I am hosted via vLLM (an inference engine for LLMs).

    * *Who am I?* A helpful AI assistant (internally known as `vllm_agent`).
    * *What are my capabilities?*
        * Answering questions (knowledge retrieval).
        * Writing and debugging code.
        * Summarization and analysis.
        * Content creation (emails, stories, essays).
        * Translation.
        * General conversation.
    * *How do I operate?* I process patterns in data to provide coherent and useful responses based on the prompts I receive.

    * Keep it helpful, clear, and concise.
    * Acknowledge the "agent" persona if appropriate, but focus on the functional utility.I am a helpful AI assistant (internally known as `vllm_agent`) designed to process information and generate human-like text. Because I am powered by a large language model running on a vLLM endpoint, I can help you with a wide variety of tasks, including:

**1. Information &amp;amp; Learning**
* **Answering questions:** From general knowledge to complex technical explanations.
* **Summarization:** Taking long articles or documents and condensing them into key points.
* **Research:** Helping you explore a topic or brainstorming ideas.

**2. Writing &amp;amp; Content Creation**
* **Drafting:** Writing emails, essays, blog posts, or reports.
* **Creative Writing:** Creating stories, poems, or scripts.
* **Editing:** Checking grammar, improving tone, or rewriting text for clarity.

**3. Technical &amp;amp; Analytical Tasks**
* **Coding:** Writing code in various programming languages, debugging errors, and explaining how a piece of code works.
* **Data Analysis:** Helping you organize information or interpret complex data.
* **Translation:** Translating text between dozens of different languages.

**4. Problem Solving &amp;amp; Brainstorming**
* **Planning:** Creating itineraries, study plans, or project outlines.
* **Ideation:** Generating names for a business, gift ideas, or marketing slogans.
* **Logic:** Solving puzzles or working through mathematical problems.

**In short: If it involves processing, generating, or analyzing text and code, I can likely help you with it! How can I assist you today?**



&lt;p&gt;and the ADK Web interface:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcddg4o1jg8kpfx3wsfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcddg4o1jg8kpfx3wsfn.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  MCP Operations from an ADK Agent with Gemma 4 TPU
&lt;/h4&gt;

&lt;p&gt;The final validation uses an ADK Agent to call the Cloud Run MCP endpoint to allow model operations over MCP calls from an agentic perspective:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent/agents$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run gemma4mcp
&lt;span class="go"&gt;/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/features/_feature_decorator.py:72: UserWarning: [EXPERIMENTAL] feature FeatureName.PLUGGABLE_AUTH is enabled.
  check_feature_enabled()
Log setup complete: /tmp/agents_log/agent.20260427_210657.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent devops, type exit to exit.
[user]: get model stats
/home/xbill/.pyenv/versions/3.13.13/lib/python3.13/site-packages/google/adk/features/_feature_decorator.py:72: UserWarning: [EXPERIMENTAL] feature FeatureName.BASE_AUTHENTICATED_TOOL is enabled.
  check_feature_enabled()
[devops]: The model currently deployed is `google/gemma-4-31B-it`. The vLLM version is `0.19.2rc1.dev215+g32e45636e` and its health status is `Healthy` ✅. The process resident memory is `2.46 GB`.



&lt;p&gt;and from the ADK Web interface:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fml76yd5ondy242f00y41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fml76yd5ondy242f00y41.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detailed information can pulled directly into the Agent:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh3qamoegsd1pquyyp6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh3qamoegsd1pquyyp6e.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  One Last Code Review with Gemini CLI
&lt;/h4&gt;

&lt;p&gt;Gemini CLI was used to summarize the project:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ This project is a TPU vLLM DevOps Agent, designed to function as an expert Site Reliability Engineer (SRE) for managing a self-hosted Gemma 4 AI
  model running on Google's Tensor Processing Units (TPUs).

  In simple terms, it's a control panel and automation layer for running your own powerful AI model, making it easier to deploy, monitor, and
  interact with.

  Here’s a breakdown of its key functionalities:

  Core Components

   1. Inference Stack: The project is built to manage a specific AI setup: the google/gemma-4-31B-it model running on a vLLM/vllm-tpu container,
      which is a specialized environment for serving models on TPUs. It's designed for high-performance inference on TPU v6e hardware.

   2. MCP Server (server.py): This is the heart of the project. It's an agent that exposes a set of tools (as an API) to manage the inference stack.
      These tools allow you to:
       * Deploy and Manage Infrastructure: Programmatically create, delete, and inspect the status of TPU resources (Queued Resources) needed to run
         the model.
       * Monitor and Observe: Check the health of the vLLM service, monitor TPU utilization (like HBM and Tensor Core usage), and fetch logs.
       * Benchmark Performance: Run load tests and benchmarks to measure the model's throughput and latency.
       * Interact with the Model: Query the self-hosted Gemma 4 model.

  Key Features

   * Turnkey Deployment: The orchestrate_gemma4_stack tool simplifies the entire process of provisioning a TPU and deploying the Gemma 4 model with
     all the necessary optimizations.
   * Cost Efficiency: It's designed to use Google Cloud's Flex-start VMs, which are lower-cost, preemptible TPUs, making it more affordable to run a
     large model.
   * Gemini CLI Integration: The project includes instructions for setting up a LiteLLM proxy. This proxy translates requests from the standard
     Gemini CLI, so you can use the gemini command on your terminal, and have the requests be served by your private, self-hosted Gemma 4 model
     instead of the public Google API.
   * Benchmarking and Testing: The project includes scripts (benchmarking_suite.py and load_test.py) to systematically measure the performance of
     your self-hosted model under various conditions.

  In summary, this project provides the automation and tooling necessary for a DevOps engineer or researcher to run, manage, and analyze a powerful,
  private instance of the Gemma 4 model on Google's TPU infrastructure.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;The strategy for using MCP for vLLM Gemma 4 deployment on TPU with Gemini CLI was validated with a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;A minimal stdio transport MCP Server was started from Python source code and validated with Gemini CLI running as a MCP client in the same local environment. This Python server provided all of the management tools to deploy and troubleshoot vLLM deployments.&lt;/p&gt;

&lt;p&gt;Then this MCP server was deployed to Cloud Run.&lt;/p&gt;

&lt;p&gt;Finally ADK agents were configured to use the MCP server and TPU deployed Gemma4 vLLM server.&lt;/p&gt;



&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>vllm</category>
      <category>googleadk</category>
      <category>tpu</category>
      <category>gemini</category>
    </item>
    <item>
      <title>ML acceleration guide: TPUs vs GPUs</title>
      <dc:creator>Glen Yu</dc:creator>
      <pubDate>Tue, 28 Apr 2026 00:16:10 +0000</pubDate>
      <link>https://forem.com/gde/ml-acceleration-guide-tpus-vs-gpus-16oh</link>
      <guid>https://forem.com/gde/ml-acceleration-guide-tpus-vs-gpus-16oh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;There’s a lot of hype around GPUs and NVIDIA, but how much do you know about TPUs?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x0q0rg9actpj3f3bokn.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x0q0rg9actpj3f3bokn.JPG" alt="Rack of TPUs at Google Next" width="800" height="1422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Article includes code examples you can find near the end&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Rise of GPUs
&lt;/h2&gt;

&lt;p&gt;Graphics Processing Units have been around for quite some time and their job is to render 2D and 3D graphics in to millions of pixels, calculating their colour, texture, lighting, in parallel to send to your monitor. For a 60Hz monitor that means producing rendered frames 60 times every second.&lt;/p&gt;

&lt;p&gt;Rendering graphics is one thing, but developing code for handling GPUs was a little more difficult. That is, until NVIDIA launched CUDA (Compute Unified Device Architecture) in 2006, which allowed scientific researchers and developers who work in fields that require massive parallel math to take advantage of a GPU’s capabilities. With the rise of machine learning in the early 2010’s, it was discovered that the massive parallel math was exactly what ML engineers needed to train deep neural networks. Since then, the focus of CUDA has been shifting more towards optimizing for machine learning and AI workloads.&lt;/p&gt;

&lt;p&gt;Because GPUs were commercially available and relatively inexpensive at the time, the barrier to entry was low. An ML engineer could train models on their NVIDIA graphics card during the day and jump into a game of League of Legends at night on the same hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Honourable mention
&lt;/h3&gt;

&lt;p&gt;AMD’s GPUs with Radeon Open Compute (ROCm) in an open-source software stack designed to compete in the AI ecosystem. Though it’s not as popular as CUDA, this gap is closing with &lt;a href="https://www.amd.com/en/newsroom/press-releases/2026-2-24-amd-and-meta-announce-expanded-strategic-partnersh.html" rel="noopener noreferrer"&gt;Meta recently signing a deal to expand its existing partnership with AMD&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tensor Processing Unit
&lt;/h2&gt;

&lt;p&gt;In the early 2010s, Google projected that the growing demands of its AI workloads, particularly the rapid adoption of deep learning across products like Search and Photos, would require doubling its data center computing capacity roughly every year and a half. Rather than scale generic hardware indefinitely, Google sought a more efficient solution purpose-built for neural network computation, and thus the Tensor Processing Unit (TPU) was born. The TPU is a custom application-specific integrated circuit (ASIC) designed by Google specifically to accelerate AI workloads, deployed internally starting in 2015. By specializing the hardware for the dense matrix operations at the heart of neural networks, TPUs achieve dramatically better performance per watt than general-purpose CPUs or GPUs, reducing both energy consumption and cooling demands at data center scale.&lt;/p&gt;

&lt;p&gt;Google has a tradition of making tools it uses internally available to the broader world, and TPUs are another example of this. The existence of TPUs was first publicly announced at Google I/O in 2016. In 2018, Cloud TPU v2 became available for external users through Google Cloud, marking the first time developers outside Google could harness the same accelerators powering Google’s own AI systems. TPUs also come in two performance flavours: &lt;em&gt;efficiency&lt;/em&gt; and &lt;em&gt;performance&lt;/em&gt; to meet different market needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: As of the 8th generation of TPUs announced during Google Next 2026, &lt;em&gt;efficiency&lt;/em&gt; and &lt;em&gt;performance&lt;/em&gt; TPUs will be renamed &lt;em&gt;&lt;strong&gt;inference&lt;/strong&gt;&lt;/em&gt; and &lt;em&gt;&lt;strong&gt;training&lt;/strong&gt;&lt;/em&gt; respectively in favour of a more descriptive, workload-based naming convention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture layout
&lt;/h2&gt;

&lt;p&gt;From an architectural standpoint, GPUs can be thought of as being individual computers with accelerators (picture your home gaming PC). If you want to connect them into a cluster, it would be over network, but no matter how fast the network is, it still has to cross node boundaries, and bandwidth drops as a result.&lt;/p&gt;

&lt;p&gt;TPUs are designed from the ground up to be interconnected at a massive scale with a physical layout that involves thousands of TPU chips in a torus topology which gives every chip 6 neighbours (two per axis, one on each side). Recognize that interconnect bandwidth would be the main bottleneck at this scale, Google designed their own proprietary Inter-Chip Interconnect (ICI) network which provides uniform, high-bandwidth, low-latency connections between all the chips in a slice regardless of physical location. With torus topology, there is no concept of crossing a node boundary. When you request TPUs, you do not get the entire TPU cluster or pod. Rather, you get only a small subset or slice. To make this possible, Google developed Optical Circuit Switch (OCS) to be able to rewire physical connections on the fly (entirely in software), allowing the same hardware to serve different workload shapes without any physical reconfiguration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Efficiency TPU versions use a 2D torus topology, while Performance TPUs leverage a 3D torus architecture to give you maximum performance with minimum latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Precision and range
&lt;/h2&gt;

&lt;p&gt;A floating-point number consists of three parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sign&lt;/strong&gt;: Positive or negative (represented by the first bit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exponent&lt;/strong&gt;: Determines the range of the number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mantissa&lt;/strong&gt;: Significant digits of a floating-point number, which determines the accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditionally, the standard for high-performance computing was FP32. When AI researchers moved to FP16 to save memory, they lost more than just accuracy: they also lost range. FP32 uses 8 bits for the exponent, while FP16 uses only 5. The 3-bit difference in the exponent bits amount to an almost 10³⁴ difference in range (FP32 has a range of 3.4 x 10³⁸, while FP16 only has a range of 6.5 x 10⁴). In deep learning, where gradients can be incredibly tiny, FP16 often suffers from underflow (meaning numbers are being rounded to 0 because it is too small for FP16’s range to represent), requiring a technical workaround called “&lt;a href="https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#lossscaling" rel="noopener noreferrer"&gt;loss scaling&lt;/a&gt;” to keep the math stable.&lt;/p&gt;

&lt;p&gt;Google Brain (now part of Google DeepMind) solved this invented Brain Floating Point (&lt;em&gt;&lt;strong&gt;bfloat16&lt;/strong&gt;&lt;/em&gt;), which simply shifted 3 bits from the mantissa to exponents:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Total Bits&lt;/th&gt;
&lt;th&gt;Exponent Bits&lt;/th&gt;
&lt;th&gt;Mantissa Bits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FP32&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FP16&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;bfloat16&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By sacrificing precision for range, bfloat offers the same massive range as FP32, but with the reduced memory and bandwidth of FP16. A huge reason for why this works is that deep learning models are surprisingly noise-tolerant and having more training stability is for more important than having a few extra decimal places of precision. Today, bfloat16 is the de facto standard for training modern LLMs on NVIDIA’s GPUs and Google’s TPUs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why XLA matters
&lt;/h2&gt;

&lt;p&gt;Standard Python execution typically takes an &lt;em&gt;eager&lt;/em&gt; approach. This means it executes each step as it is being encountered. This is great for debugging because you can insert print statements to inspect variables at any point.&lt;/p&gt;

&lt;p&gt;XLA (Accelerated Linear Algebra), on the other hand, is a domain-specific JIT compiler. Instead of executing steps one by one, it analyzes the entire execution graph to optimize and fuse operations before they run. This &lt;em&gt;lazy&lt;/em&gt; approach creates an initial warm-up delay, but once the training starts, it is significantly faster than standard methods. The tradeoff is transparency: your step-by-step Python code becomes an optimized “black box”, making traditional debugging strategies more difficult. This is why TPUs are powerhouses for massive enterprise training, while GPUs remain the flexible choice for quick experimentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: Though XLA was built for TPUs, it’s also made its way into the NVIIA GPU ecosystem via tools such as JAX and torch.compile since PyTorch 2.0.&lt;/p&gt;

&lt;h3&gt;
  
  
  TorchTPU
&lt;/h3&gt;

&lt;p&gt;Google is engineering a &lt;a href="https://developers.googleblog.com/torchtpu-running-pytorch-natively-on-tpus-at-google-scale/" rel="noopener noreferrer"&gt;TorchTPU&lt;/a&gt; stack that will provide native PyTorch support. This would allow you to run models in TPUs as they are with full support for native PyTorch features. TorchTPU is currently in preview, and once it becomes GA, you can be sure I’ll be diving deeper into it!&lt;/p&gt;

&lt;h2&gt;
  
  
  Code example
&lt;/h2&gt;

&lt;p&gt;I’m including a couple of Jupyter notebooks that I ran via &lt;a href="https://medium.com/google-cloud/leveraging-tpus-in-colab-featuring-antigravity-c312ad12c1b6" rel="noopener noreferrer"&gt;Antigravity + Colab plugin&lt;/a&gt; for you to try yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://storage.googleapis.com/public-file-server/aiml/mnist_w_gpu_cuda.ipynb" rel="noopener noreferrer"&gt;Fashion MNIST with GPU and CUDA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://storage.googleapis.com/public-file-server/aiml/mnist_w_tpu_xla.ipynb" rel="noopener noreferrer"&gt;Fashion MNIST with TPU and XLA&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you will see from the results below, TPU is indeed faster. However, my example isn’t large enough or complex enough to really showcase the true speeds that TPU can bring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: I have a Colab Pro account which affords me access to additional GPUs and TPUs. The Colab free tier only includes NVIDIA T4 and TPU v5e-1&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpreting training results
&lt;/h3&gt;

&lt;p&gt;These are some benchmark trainings (epochs: 50, batch size: 512) in which I used a NVIDIA T4 GPU with (default) FP32 vs Google TPU v5e-1 (single chip TPU) with bfloat16. As expected, TPUs were faster but with lower precision:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76ssyverv2op61zythqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76ssyverv2op61zythqn.png" alt="T4 GPU (FP32), epochs: 50, batch size: 512" width="800" height="996"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzamdd219pa47g3sjwhm4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzamdd219pa47g3sjwhm4.png" alt="TPU v5e-1 (bfloat16), epochs: 50, batch size: 512" width="800" height="994"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then trained the same model using the T4 GPU using bfloat16 but noticed a massive performance drop. This was due to the T4 being an older generation GPU that did not support bfloat16 natively and had to emulate which added a lot of overhead. Switching to a newer L4 GPU, I was able to see the (tiny) performance gain along with the reduced precision:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3c8d45visujw7eupu75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3c8d45visujw7eupu75.png" alt="T4 GPU (bfloat16), epochs: 50, batch size: 512" width="800" height="1002"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdny1eu3il9ogeyral3v6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdny1eu3il9ogeyral3v6.png" alt="L4 GPU (bfloat16), epochs: 50, batch size: 512" width="640" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, I thought I’d see how the training would perform on a newer TPU v6e-1 and I was blown away by the improvement:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31c4dykl3knydhg73mmi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31c4dykl3knydhg73mmi.png" alt="TPU v6e-1 (bfloat16), epochs: 50, batch size: 512" width="640" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Comparing GPUs and TPUs isn’t exactly apples-to-apples. They represent fundamentally different philosophies in architecture, memory management, and execution.&lt;/p&gt;

&lt;p&gt;In the modern enterprise, it isn’t usually a matter of choosing one over the other, but rather using each where it shines. For rapid iteration and smaller workloads, the flexibility of GPUs is unmatched. However, once a project hits a certain scale, the domain-specific architecture of the TPU becomes the clear winner in efficiency and throughput.&lt;/p&gt;

&lt;p&gt;TPUs are as fast as they are because they are a specialized one-trick pony, but to truly harness that power requires a deeper understanding of the stack. The biggest challenge isn’t often the compute itself, but rather: “How do I feed data to the TPUs fast enough and efficiently enough so that it doesn’t become the bottleneck?” and ensuring your input pipeline is fast enough so that the hardware doesn’t sit idle.&lt;/p&gt;

&lt;p&gt;In future posts, I'll dive deeper into these advanced concepts to show how you can optimizing data pipelines to get the most out of your TPUs.&lt;/p&gt;

&lt;h3&gt;
  
  
  BONUS: Google’s 8th-generation TPUs announced at Google Next
&lt;/h3&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/3Qw_CZkiQQg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>tpu</category>
      <category>gpu</category>
      <category>machinelearning</category>
      <category>tpusprint</category>
    </item>
    <item>
      <title>Fine-Tune Any HuggingFace Model like Gemma on TPUs with TorchAX</title>
      <dc:creator>Ahmed Elnaggar</dc:creator>
      <pubDate>Mon, 27 Apr 2026 08:45:25 +0000</pubDate>
      <link>https://forem.com/gde/fine-tune-any-huggingface-model-like-gemma-on-tpus-with-torchax-5g21</link>
      <guid>https://forem.com/gde/fine-tune-any-huggingface-model-like-gemma-on-tpus-with-torchax-5g21</guid>
      <description>&lt;h2&gt;
  
  
  What if you could fine-tune any HuggingFace model on TPUs — using PyTorch code?
&lt;/h2&gt;

&lt;p&gt;Here is what the end result looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchax&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchax.train&lt;/span&gt;

&lt;span class="c1"&gt;# One function: forward → loss → gradients → optimizer update
&lt;/span&gt;&lt;span class="n"&gt;step_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_train_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Training loop
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataloader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;step_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your PyTorch model. JAX's training primitives. Running on TPU. No rewrite needed.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/gde/run-any-huggingface-model-on-tpus-a-beginners-guide-to-torchax-4ln0"&gt;first part of this series&lt;/a&gt;, we ran HuggingFace models on JAX for fast inference. Now we take the next step: &lt;strong&gt;training&lt;/strong&gt;. We will instruction-tune Gemma 3 1B on the Databricks Dolly 15k dataset using LoRA and torchax's functional training API — all on a free Colab TPU.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://colab.research.google.com/github/agemagician/torchax-huggingface/blob/main/notebooks/torchax_training_tutorial.ipynb" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcolab.research.google.com%2Fassets%2Fcolab-badge.svg" alt="Open Full Tutorial In Colab" width="117" height="20"&gt;&lt;/a&gt; &lt;a href="https://colab.research.google.com/github/agemagician/torchax-huggingface/blob/main/notebooks/torchax_training_quickstart.ipynb" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcolab.research.google.com%2Fassets%2Fcolab-badge.svg" alt="Open Quick Start In Colab" width="117" height="20"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Train on TPUs?
&lt;/h2&gt;

&lt;p&gt;Google's Tensor Processing Units (TPUs) are purpose-built for matrix operations — the bread and butter of deep learning. Free Colab gives you access to a TPU v2-8 with ~15GB of high-bandwidth memory. That is enough to fine-tune a 1B parameter model with LoRA.&lt;/p&gt;

&lt;p&gt;But training on TPUs traditionally meant rewriting your model in JAX (Flax, Equinox) or using PyTorch/XLA. &lt;strong&gt;torchax&lt;/strong&gt; offers a third path: keep your PyTorch model, but use JAX's functional training primitives.&lt;/p&gt;

&lt;h3&gt;
  
  
  How torchax Training Differs from Standard PyTorch
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Standard PyTorch&lt;/th&gt;
&lt;th&gt;torchax&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;loss.backward()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;jax.value_and_grad(loss_fn)(params, ...)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;optimizer.step()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;optax.apply_updates(params, updates)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model holds its own state&lt;/td&gt;
&lt;td&gt;Params and buffers are separate pytrees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eager execution&lt;/td&gt;
&lt;td&gt;JIT-compiled training steps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key difference: &lt;strong&gt;functional training&lt;/strong&gt;. Instead of calling &lt;code&gt;loss.backward()&lt;/code&gt; and &lt;code&gt;optimizer.step()&lt;/code&gt; on a stateful model, torchax separates the model into immutable weight pytrees and passes them through pure functions. This is what enables JAX's &lt;code&gt;jax.jit&lt;/code&gt; to compile the entire training step into a single optimized program.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites &amp;amp; Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;Basic familiarity with PyTorch and HuggingFace transformers&lt;/li&gt;
&lt;li&gt;A Google Colab account (free tier works with LoRA)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Zero-setup option:&lt;/strong&gt; Click the Colab badge above. The notebook handles all installation automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# PyTorch CPU (torchax handles the accelerator via JAX)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cpu

&lt;span class="c"&gt;# JAX + all training dependencies in a single pip call&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s1"&gt;'jax[tpu]'&lt;/span&gt; torchax transformers flax peft datasets optax   &lt;span class="c"&gt;# TPU&lt;/span&gt;
&lt;span class="c"&gt;# pip install -U 'jax[cuda12]' torchax transformers flax peft datasets optax  # GPU&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Colab note:&lt;/strong&gt; The notebook installs packages and automatically restarts the runtime, since Colab pre-loads an older JAX that stays cached in memory until restart.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key Concepts for Training
&lt;/h2&gt;

&lt;p&gt;Before writing code, let's understand the four concepts that make torchax training work.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Param/Buffer Separation
&lt;/h3&gt;

&lt;p&gt;JAX's &lt;code&gt;jax.value_and_grad&lt;/code&gt; needs to know &lt;em&gt;which&lt;/em&gt; inputs to differentiate. In standard PyTorch, the model owns its weights. In torchax training, we explicitly separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;params&lt;/strong&gt; — trainable parameters (get gradients)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;buffers&lt;/strong&gt; — everything else (frozen weights, running stats, constants)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;frozen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;buffers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_buffers&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For LoRA, &lt;code&gt;params&lt;/code&gt; contains only the tiny adapter weights (~0.5% of the model). For full fine-tuning, it contains everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. optax Optimizers
&lt;/h3&gt;

&lt;p&gt;Unlike PyTorch optimizers (which carry hidden mutable state), optax optimizers are &lt;strong&gt;pure functions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# PyTorch: hidden state inside optimizer
&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# optax: explicit state, no hidden pockets
&lt;/span&gt;&lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;new_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_updates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This functional design means the optimizer state is just another pytree that flows through the training step — perfect for &lt;code&gt;jax.jit&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. make_train_step
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;torchax.train.make_train_step()&lt;/code&gt; is the central API. It composes three pieces into a single JIT-compilable function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;model_fn&lt;/strong&gt; — a pure function: &lt;code&gt;(weights, buffers, batch) → output&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;loss_fn&lt;/strong&gt; — extracts the scalar loss: &lt;code&gt;(output, labels) → loss&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;optimizer&lt;/strong&gt; — an optax optimizer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is &lt;code&gt;step_fn(params, buffers, opt_state, batch, labels) → (loss, new_params, new_opt_state)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Under the hood, this uses &lt;code&gt;jax.value_and_grad&lt;/code&gt; for efficient gradient computation and &lt;code&gt;optax.apply_updates&lt;/code&gt; for weight updates — all compiled into a single XLA program.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Full Fine-Tuning vs LoRA
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Full Fine-Tuning&lt;/th&gt;
&lt;th&gt;LoRA&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trainable params&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All (~2B)&lt;/td&gt;
&lt;td&gt;Tiny adapters (~0.5%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~18-20 GB&lt;/td&gt;
&lt;td&gt;~5-7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slower&lt;/td&gt;
&lt;td&gt;Faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher ceiling&lt;/td&gt;
&lt;td&gt;Nearly as good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free Colab TPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tight / may OOM&lt;/td&gt;
&lt;td&gt;Fits comfortably&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;LoRA&lt;/strong&gt; (Low-Rank Adaptation) freezes the base model and adds small trainable matrices to attention layers. Instead of updating the full weight matrix W, it learns a low-rank decomposition: &lt;code&gt;W + (α/r) × B·A&lt;/code&gt; where A and B are tiny matrices.&lt;/p&gt;

&lt;p&gt;For free Colab, LoRA is the recommended path.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Load and Prepare the Dataset
&lt;/h2&gt;

&lt;p&gt;We use &lt;a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k" rel="noopener noreferrer"&gt;Databricks Dolly 15k&lt;/a&gt; — 15,000 human-written instruction-response pairs across 7 categories (QA, summarization, brainstorming, etc.).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;hf_datasets&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;

&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-3-1b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DATASET_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;databricks/databricks-dolly-15k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token&lt;/span&gt;

&lt;span class="n"&gt;raw_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hf_datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATASET_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each example has an &lt;code&gt;instruction&lt;/code&gt;, optional &lt;code&gt;context&lt;/code&gt;, &lt;code&gt;response&lt;/code&gt;, and &lt;code&gt;category&lt;/code&gt;. We format these into Gemma's chat template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instruction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;user_content&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_content&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tokenize and create dataloaders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.utils.data&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataCollatorForLanguageModeling&lt;/span&gt;

&lt;span class="c1"&gt;# Subset, split, tokenize
&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;split&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tokenize_example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_tokenized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenize_example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remove_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;column_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;eval_tokenized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenize_example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remove_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;column_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;collator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataCollatorForLanguageModeling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mlm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;train_dataloader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_tokenized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collate_fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;eval_dataloader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eval_tokenized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collate_fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Load the Model and Apply LoRA
&lt;/h2&gt;

&lt;p&gt;Here is where the torchax pattern matters: load the model with torchax &lt;strong&gt;disabled&lt;/strong&gt;, then enable it before moving to JAX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchax&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt;

&lt;span class="c1"&gt;# Load model with torchax disabled to avoid intercepting init ops
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disable_temporarily&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Sync pad_token_id so loss computation properly ignores padding
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why disable?&lt;/strong&gt; HuggingFace model initialization uses operations (like in-place tensor filling) that torchax does not support. Disabling torchax during loading keeps everything on CPU, then we move to JAX after.&lt;/p&gt;

&lt;p&gt;Now apply LoRA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;peft_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;peft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TaskType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;inference_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                             &lt;span class="c1"&gt;# Rank of the LoRA matrices
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                   &lt;span class="c1"&gt;# Scaling factor
&lt;/span&gt;    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;# 0.0 for bfloat16 numerical stability
&lt;/span&gt;    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# All attention layers
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;peft_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# Output: trainable params: 5,767,168 || all params: 2,619,206,656 || trainable%: 0.22%
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only 0.22% of parameters are trainable — that is the power of LoRA.&lt;/p&gt;

&lt;p&gt;Finally, enable torchax and move to the JAX device:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_accuracy_mode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Float32 accumulation for bfloat16 stability
&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_globally&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Baseline Evaluation
&lt;/h2&gt;

&lt;p&gt;Before training, we measure the model's performance to compare against later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataloader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_batches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;total_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataloader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;max_batches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="c1"&gt;# Drop attention_mask — Gemma's sliding window attention produces NaN
&lt;/span&gt;            &lt;span class="c1"&gt;# with padded masks on torchax/JAX. Labels already mask padding with -100.
&lt;/span&gt;            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total_loss&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;total_batches&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;avg_loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;total_loss&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_batches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;baseline_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseline_ppl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_dataloader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Baseline loss: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline_loss&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, perplexity: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline_ppl&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also generate sample responses for qualitative comparison. For fast generation, we register &lt;code&gt;StaticCache&lt;/code&gt; as a JAX pytree and use KV-cached decoding — only the new token is processed each step instead of the full sequence (~50x faster):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers.cache_utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StaticCache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jax.tree_util&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;register_pytree_node&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_flatten_static_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key_cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_cache&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_cache_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_unflatten_static_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aux&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_cache_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aux&lt;/span&gt;
    &lt;span class="n"&gt;kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dev&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;
    &lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StaticCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_cache_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key_cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;children&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sc&lt;/span&gt;

&lt;span class="nf"&gt;register_pytree_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;StaticCache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_flatten_static_cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_unflatten_static_cache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generation function uses prefill (process full prompt) then per-token decode with the cache and a &lt;code&gt;tqdm&lt;/code&gt; progress bar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm.auto&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;input_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;seq_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StaticCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;max_cache_len&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;seq_len&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="c1"&gt;# Prefill: process full prompt, populate cache
&lt;/span&gt;        &lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_position&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;past_key_values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="n"&gt;return_dict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;generated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tok&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
        &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Decode: one token at a time using cached keys/values
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_new_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leave&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_position&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;past_key_values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="n"&gt;return_dict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tok&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4: Set Up Functional Training
&lt;/h2&gt;

&lt;p&gt;This is where torchax diverges from standard PyTorch. We separate the model, create an optax optimizer, and compose everything into a JIT-compiled training step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Separate params and buffers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchax.train&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;buffers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_buffers&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;frozen_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;named_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create the optimizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;schedule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warmup_cosine_decay_schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;init_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;peak_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warmup_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decay_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip_by_global_norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;adamw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_jax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note &lt;code&gt;tx.interop.call_jax&lt;/code&gt; — this bridges optax's JAX calls with torchax tensors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Define model_fn and loss_fn
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;model_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Stateless forward pass using functional_call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;functional_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;loss_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract loss from HuggingFace model output.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;torch.func.functional_call&lt;/code&gt; runs the model as a pure function — no hidden state, just inputs and outputs. This is what enables JAX to trace and compile it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compose into a training step
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;step_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_train_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single line creates a function that does: forward pass → loss computation → gradient calculation → optimizer update — all compiled into one XLA program.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: The Training Loop
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tqdm.auto&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tqdm&lt;/span&gt;

&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;manual_seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;train_losses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;pbar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_dataloader&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_dataloader&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pbar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Drop attention_mask — Gemma's sliding window attention produces NaN with
&lt;/span&gt;        &lt;span class="c1"&gt;# padded masks on torchax/JAX. Labels already mask padding with -100.
&lt;/span&gt;        &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;step_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;train_losses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;pbar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_postfix&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loss&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Training complete! &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_losses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; steps in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What to expect:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; ~30-60 seconds (JAX compiles the entire training step)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steps 2+:&lt;/strong&gt; ~1-3 seconds each (running the compiled program)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; ~20-40 minutes for 2000 samples with LoRA on free Colab TPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first step is slow because JAX traces through the entire model, loss computation, gradient calculation, and optimizer update — then compiles it all into a single optimized XLA program. Every subsequent step reuses this compiled program.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Evaluate the Improvement
&lt;/h2&gt;

&lt;p&gt;After training, we compare against our baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load trained params back into model
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;setattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;final_loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_ppl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_dataloader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Before&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;After&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Loss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline_loss&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_loss&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Perplexity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline_ppl&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_ppl&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see loss decrease and perplexity improve after training. The qualitative comparison (generated responses before vs. after) is even more telling — the fine-tuned model produces more focused, instruction-following responses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 7: Save and Reload
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Save
&lt;/h3&gt;

&lt;p&gt;Convert JAX arrays back to CPU tensors and save using HuggingFace's standard format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;save_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./fine_tuned_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;cpu_state_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;contiguous&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# safe_serialization=False avoids a safetensors/torchax C-extension conflict on reload
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state_dict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cpu_state_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;safe_serialization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For LoRA, this saves only the tiny adapter weights (~20MB). For full fine-tuning, it saves the entire model (~4GB).&lt;/p&gt;

&lt;h3&gt;
  
  
  Reload
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disable_temporarily&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# For LoRA: load base model + adapters separately
&lt;/span&gt;    &lt;span class="n"&gt;reloaded_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# torch_device="cpu" forces PEFT to load adapter weights on CPU,
&lt;/span&gt;    &lt;span class="c1"&gt;# avoiding a safetensors/torchax C-extension conflict.
&lt;/span&gt;    &lt;span class="n"&gt;reloaded_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PeftModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reloaded_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch_device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reloaded_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;reloaded_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern is the same as loading: disable torchax, load on CPU, then move to JAX. For LoRA models, you load the base model first, then attach the saved adapters with &lt;code&gt;PeftModel.from_pretrained()&lt;/code&gt;. The &lt;code&gt;torch_device="cpu"&lt;/code&gt; ensures PEFT loads weights through PyTorch's standard path rather than safetensors' C extension, which conflicts with torchax.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Fine-Tuning: When LoRA Is Not Enough
&lt;/h2&gt;

&lt;p&gt;The notebook supports full fine-tuning by changing one setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TRAINING_MODE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This trains all parameters instead of just the LoRA adapters. The trade-off is much higher memory usage. To make it fit on free Colab TPU:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AdaFactor optimizer&lt;/strong&gt; — uses ~50% less memory than AdamW (stores only row/column statistics instead of per-parameter moments)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced sequence length&lt;/strong&gt; — &lt;code&gt;MAX_SEQ_LEN = 256&lt;/code&gt; halves activation memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller batch size&lt;/strong&gt; — &lt;code&gt;BATCH_SIZE = 1&lt;/code&gt; with higher gradient accumulation steps
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;USE_ADAFACTOR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;USE_GRADIENT_CHECKPOINTING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;TRAINING_MODE&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;USE_ADAFACTOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip_by_global_norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;adafactor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip_by_global_norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;adamw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full fine-tuning gives a higher quality ceiling but LoRA gets you 90%+ of the way with a fraction of the compute.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OutOfMemoryError&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model + optimizer too large&lt;/td&gt;
&lt;td&gt;Switch to LoRA, reduce &lt;code&gt;BATCH_SIZE&lt;/code&gt; or &lt;code&gt;MAX_SEQ_LEN&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TypeError: not a valid JAX type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Custom HuggingFace type not registered&lt;/td&gt;
&lt;td&gt;Register with &lt;code&gt;jax.tree_util.register_pytree_node()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Loss is NaN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Numerical instability in bfloat16&lt;/td&gt;
&lt;td&gt;1. Call &lt;code&gt;tx.enable_accuracy_mode()&lt;/code&gt; before &lt;code&gt;tx.enable_globally()&lt;/code&gt;. 2. Reduce LR (try 1e-4). 3. Set &lt;code&gt;lora_dropout=0.0&lt;/code&gt;. 4. Add &lt;code&gt;optax.clip_by_global_norm(1.0)&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Slow first step&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Normal — JAX JIT compilation&lt;/td&gt;
&lt;td&gt;Wait ~30-60s; subsequent steps are fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;make_train_step error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API mismatch&lt;/td&gt;
&lt;td&gt;Update: &lt;code&gt;pip install -U torchax&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Big Picture: Inference + Training
&lt;/h2&gt;

&lt;p&gt;With the &lt;a href="https://dev.to/gde/run-any-huggingface-model-on-tpus-a-beginners-guide-to-torchax-4ln0"&gt;inference tutorial&lt;/a&gt; and this training tutorial, you now have the complete torchax story:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run&lt;/strong&gt; any HuggingFace model on TPU (&lt;code&gt;model.to("jax")&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark&lt;/strong&gt; with JIT compilation (10-100x speedup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tune&lt;/strong&gt; with LoRA or full training (&lt;code&gt;make_train_step&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Save&lt;/strong&gt; and reload for production inference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All using PyTorch code. No JAX rewrite needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Notebooks:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://colab.research.google.com/github/agemagician/torchax-huggingface/blob/main/notebooks/torchax_training_tutorial.ipynb" rel="noopener noreferrer"&gt;Full training tutorial&lt;/a&gt; — all the code from this post, ready to run&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://colab.research.google.com/github/agemagician/torchax-huggingface/blob/main/notebooks/torchax_training_quickstart.ipynb" rel="noopener noreferrer"&gt;Training quickstart&lt;/a&gt; — same pipeline in ~10 cells&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://colab.research.google.com/github/agemagician/torchax-huggingface/blob/main/notebooks/torchax_huggingface_tutorial.ipynb" rel="noopener noreferrer"&gt;Inference tutorial&lt;/a&gt; — Part 1 of this series&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Libraries:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/google/torchax" rel="noopener noreferrer"&gt;torchax GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;PEFT/LoRA documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://optax.readthedocs.io/" rel="noopener noreferrer"&gt;optax documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;References:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/google/torchax/blob/main/examples/peft_lora_training.py" rel="noopener noreferrer"&gt;torchax PEFT LoRA example&lt;/a&gt; — the official example this tutorial builds on&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/blog/qihqi/huggingface-jax-01" rel="noopener noreferrer"&gt;Han Qi's tutorial series&lt;/a&gt; — the original 3-part series on torchax + HuggingFace&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/qihqi" rel="noopener noreferrer"&gt;Han Qi (@qihqi)&lt;/a&gt;&lt;/strong&gt; — author of torchax, PEFT training example, and the original tutorial series&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/google/torchax" rel="noopener noreferrer"&gt;torchax team at Google&lt;/a&gt;&lt;/strong&gt; — library development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt;&lt;/strong&gt; — transformers, PEFT, and datasets ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.databricks.com/" rel="noopener noreferrer"&gt;Databricks&lt;/a&gt;&lt;/strong&gt; — Dolly 15k dataset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/jax-ml/jax" rel="noopener noreferrer"&gt;JAX team at Google&lt;/a&gt;&lt;/strong&gt; — JAX, XLA, and TPU support&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>pytorch</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Next-Generation Google Workspace Automation</title>
      <dc:creator>Tanaike</dc:creator>
      <pubDate>Mon, 27 Apr 2026 01:52:43 +0000</pubDate>
      <link>https://forem.com/gde/next-generation-google-workspace-automation-1h22</link>
      <guid>https://forem.com/gde/next-generation-google-workspace-automation-1h22</guid>
      <description>&lt;p&gt;&lt;strong&gt;A Comparative Study of Agentic Frameworks and Multi-Agent Orchestration&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;The transition from passive chatbots to autonomous execution environments was cemented at Google Cloud Next '26 with the introduction of the Gemini Enterprise Agent Platform. This paper evaluates four cutting-edge AI agent methodologies for Google Workspace automation, developed by leading developers Martin Hawksey, Bruce Mcpherson, and Kanshi Tanaike. We deconstruct their structural approaches—CLI skill chaining, advanced emulation sandboxing, dynamic code generation, and A2A remote delegation—demonstrating how these community-driven innovations anticipated native Next '26 features like the official Agent Skills repository and Model Context Protocol (MCP) support. Building upon these foundations, we propose two novel frameworks: the Federated Context-Aware Routing Architecture (Federated CARA) for zero-trust, multi-cloud task routing, and the Self-Optimizing Tool Caching Network (SOTCN) to eliminate Tool Space Interference using dynamic semantic caching. This comparative synthesis maps existing and proposed models against Google's new enterprise standards, offering a scalable roadmap for secure, highly dynamic multi-agent orchestration.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;Historically, automating tasks within Google Workspace relied heavily on static, hardcoded macros defined via Google Apps Script (GAS) and rigidly scheduled cloud triggers. While highly effective for predictable workflows, this paradigm lacked the adaptability required for complex, context-dependent enterprise operations. The advent of Large Language Models (LLMs) catalyzed the transition toward the "Agentic Enterprise," wherein AI entities act as autonomous orchestration layers capable of interacting dynamically with vast API ecosystems. At Google Cloud Next '26, this shift was cemented through the introduction of &lt;em&gt;Workspace Intelligence&lt;/em&gt;, a semantic unifying layer that allows agents to autonomously execute multi-step tasks across Gmail, Docs, Sheets, and Drive without manual context provisioning.&lt;/p&gt;

&lt;p&gt;However, bridging LLM reasoning engines with Google Workspace APIs introduces profound architectural challenges, notably regarding execution latency, security perimeters, and Tool Space Interference (TSI). TSI—officially recognized by Google as "context bloat"—is a phenomenon where an agent's reasoning accuracy degrades, and token costs skyrocket, when its context window is overloaded with a massive library of predefined functions.&lt;/p&gt;

&lt;p&gt;This paper analyzes four prominent, recently published developer methodologies that sought to solve these challenges before they were addressed natively. By evaluating these frameworks against Next '26 announcements like the &lt;em&gt;GKE Agent Sandbox&lt;/em&gt; , native &lt;em&gt;Agent Registry&lt;/em&gt; , and the &lt;em&gt;Agent Gateway&lt;/em&gt;, we identify key strengths and limitations. Subsequently, we propose advanced hybrid frameworks—Federated CARA and SOTCN—tailored to leverage Google's new native security and orchestration layers for the next generation of highly secure, scalable enterprise deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Architectural Analysis of Existing Methodologies
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9j75q1kzurknjt6jjzf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9j75q1kzurknjt6jjzf.jpg" alt="Architectural Analysis of Existing 4 Methodologies" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The automation of Google Workspace using AI agents is an emerging field characterized by highly divergent architectural philosophies. The following sections provide an exhaustive deconstruction of four primary paradigms.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Skill-Based Prototyping Using CLI Integrations (Martin Hawksey)
&lt;/h3&gt;

&lt;p&gt;Ref: &lt;a href="https://www.linkedin.com/pulse/exploring-workspace-intelligence-skills-using-cli-gemini-hawksey-kqihe/" rel="noopener noreferrer"&gt;Exploring Workspace Intelligence Skills using the Workspace CLI and Gemini CLIs with Apps Script API Executables&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ref: &lt;a href="https://github.com/mhawksey/gws-web-to-doc/" rel="noopener noreferrer"&gt;gws-web-to-doc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Martin Hawksey’s project (&lt;code&gt;gws-web-to-doc&lt;/code&gt;) demonstrates how discrete, legacy command-line tools can be wrapped into modular AI skills. By combining local Node.js execution with the Google Workspace CLI (&lt;code&gt;gws&lt;/code&gt;) and remote GAS deployment, Hawksey creates a linear execution pipeline. For instance, in his "Web-to-Doc" workflow, local scripts extract web content, the CLI natively converts the Markdown to a Google Document, and an API Executable GAS script (&lt;code&gt;resizer.gs&lt;/code&gt;) handles precise image formatting.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Strength:&lt;/strong&gt; Excellently bridges legacy scripting and modern LLM orchestration, making isolated tasks readily accessible to AI.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.2 Local Natural Language Execution via Emulation (Bruce Mcpherson)
&lt;/h3&gt;

&lt;p&gt;Ref: &lt;a href="https://ramblings.mcpher.com/gas-fakes-agent/" rel="noopener noreferrer"&gt;gas-fakes agent: local natural language requests against workspace resources&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ref: &lt;a href="https://github.com/brucemcpherson/gas-fakes/tree/main/gf_agent" rel="noopener noreferrer"&gt;gf_agent - Google Apps Script Local Automation Agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bruce Mcpherson addresses iteration latency and cloud security through his &lt;code&gt;gas-fakes-agent&lt;/code&gt; framework. By utilizing the &lt;code&gt;gas-fakes&lt;/code&gt; emulation layer, Mcpherson enables an LLM to dynamically generate GAS syntax and execute it within a Node.js environment that can be run locally or containerized in any cloud platform. Because it uses Google APIs under the hood, standard network and API latencies still apply, and the files manipulated remain cloud-based just like native GAS. The framework is highly versatile—it can be deployed across multiple cloud platforms (aligning with the modern emphasis on sovereign clouds) and can operate and mix multiple backends, such as Microsoft OneDrive and Office files. Crucially, the &lt;code&gt;gf_agent&lt;/code&gt; approach is made possible by continuous learning from over 10,000 tests that &lt;code&gt;gas-fakes&lt;/code&gt; uses to ensure its consistency with live Apps Script. This foundation allows the agent to self-augment through continuous cyclical feedback as new methods are introduced.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Strength:&lt;/strong&gt; Advanced execution sandboxing and robust permission management. Its security paradigm emphasizes secure sandboxing rather than authentication per se. While it utilizes the same authentication mechanisms available to any OAuth2 protected app—such as Application Default Credentials (ADC), Domain Wide Delegation (DWD), and keyless workload identity federation—it uniquely takes care of the permission complications associated with them. Furthermore, its native compatibility with the Model Context Protocol (MCP) sets a standardized foundation for AI-to-tool interfacing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.3 Dynamic Tool Creation to Combat Tool Space Interference (Kanshi Tanaike)
&lt;/h3&gt;

&lt;p&gt;Ref: &lt;a href="https://medium.com/google-cloud/empowering-autonomous-ai-agents-through-dynamic-tool-creation-550683f255a4" rel="noopener noreferrer"&gt;Empowering Autonomous AI Agents through Dynamic Tool Creation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ref: &lt;a href="https://github.com/tanaikech/autonomous-google-workspace-agent" rel="noopener noreferrer"&gt;autonomous-google-workspace-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To solve the cognitive bottleneck of Tool Space Interference (TSI) in enterprise LLMs, Kanshi Tanaike’s &lt;code&gt;autonomous-google-workspace-agent&lt;/code&gt; introduces a fully autonomous, self-healing multi-agent architecture. When faced with an edge case lacking a predefined solution, a Senior Orchestrator coordinates five sub-agents (Environment Checker, Script Writer, Script Executor, Script Uploader, and Summary Agent) to write, sandbox-test (using &lt;code&gt;gas-fakes&lt;/code&gt;), deploy (via &lt;code&gt;clasp&lt;/code&gt;), and execute custom tools in real-time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Strength:&lt;/strong&gt; Fundamentally eliminates TSI by relying on dynamic tool generation rather than a saturated context window. It serves as a secure "kill switch" against reasoning drift by isolating untrusted code generation within a sandbox before cloud deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.4 Remote Subagent Integration via the A2A Protocol (Kanshi Tanaike)
&lt;/h3&gt;

&lt;p&gt;Ref: &lt;a href="https://medium.com/google-cloud/integrating-remote-subagents-built-by-google-apps-script-with-gemini-cli-0ee6b54a658d" rel="noopener noreferrer"&gt;Integrating Remote Subagents Built by Google Apps Script with Gemini CLI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ref: &lt;a href="https://github.com/tanaikech/gemini-cli-gas-a2a-subagents" rel="noopener noreferrer"&gt;gemini-cli-gas-a2a-subagents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a subsequent approach (&lt;code&gt;gemini-cli-gas-a2a-subagents&lt;/code&gt;), Tanaike solves TSI through remote delegation rather than dynamic generation. A primary LLM offloads a massive repository of over 160 established Workspace skills to a specialized Google Apps Script Web App using the Agent-to-Agent (A2A) protocol. Tanaike elegantly bypasses strict Google Cloud cross-domain (CORS) authentication hurdles by predefining "agent cards" locally within the Gemini CLI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Strength:&lt;/strong&gt; Preserves the primary agent's reasoning stability without the compute overhead of writing new code, allowing frictionless access to massive legacy macro libraries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Proposed Novel Methodologies for Advanced Workspace Automation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhyw4r5r4mxhksvww64u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhyw4r5r4mxhksvww64u.jpg" alt="Proposed Novel 2 Methodologies for Advanced Workspace Automation" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on the strengths and limitations of the analyzed frameworks, and aligning with the latest native paradigms introduced at Google Cloud Next '26, we propose two novel architectural approaches intended for advanced enterprise use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Federated Context-Aware Routing Architecture (Federated CARA)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concept:&lt;/strong&gt; A zero-trust hybrid orchestration network that dynamically routes AI tasks based on a real-time assessment of data sensitivity, computational demand, and cross-platform interoperability requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; A central LLM routing agent acts as a security triage engine. If a natural language prompt involves sensitive organizational data or requires orchestration across mixed backends (e.g., interacting simultaneously with Google Workspace and Microsoft OneDrive), the orchestrator routes the execution to an &lt;strong&gt;advanced sandboxed emulation layer&lt;/strong&gt;. Synthesizing Mcpherson's &lt;code&gt;gas-fakes&lt;/code&gt; approach, this layer utilizes keyless workload identity federation and Application Default Credentials (ADC) to securely manage complex permissions. It natively supports multi-cloud deployments—including sovereign clouds—ensuring that while external APIs are accessed under the hood, the execution environment itself remains strictly controlled and continuously validated through self-augmenting feedback loops. Conversely, if the task requires heavy bulk processing over established legacy macros, the execution is delegated to a &lt;strong&gt;remote A2A subagent&lt;/strong&gt; (synthesizing Tanaike's Web App protocol).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enterprise Value:&lt;/strong&gt; Maximizes cross-platform versatility and infrastructure flexibility while enforcing strict compliance and data governance perimeters. By focusing on advanced execution sandboxing rather than relying solely on traditional authentication barriers, it securely bridges distinct enterprise ecosystems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.2 Self-Optimizing Tool Caching Network (SOTCN)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concept:&lt;/strong&gt; An advanced solution to Tool Space Interference (TSI) that bridges the gap between dynamic code creation, static remote delegation, and newly standardized enterprise skill registries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; Instead of writing code from scratch (which incurs API latency) or loading 160+ tools simultaneously (which inevitably degrades LLM reasoning), the SOTCN utilizes a massive "cold storage" repository of pre-validated scripts. Crucially, this architecture is designed to integrate seamlessly with Google's newly announced official &lt;strong&gt;Agent Skills repository&lt;/strong&gt;. By treating both officially published Agent Skills and custom-built Workspace scripts as modular assets, SOTCN acts as a dynamic curator. When a user issues a command, an ultra-fast semantic vector search identifies the top 3–5 most relevant skills. A local "Injection Agent" then dynamically injects &lt;em&gt;only&lt;/em&gt; those specific functions into the primary LLM's active Model Context Protocol (MCP) context window for the duration of that specific session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enterprise Value:&lt;/strong&gt; Eliminates TSI entirely by maintaining a pristine, minimal active context window. By leveraging both community-driven repositories and official Google Agent Skills, SOTCN avoids the latency and debugging failure rates of real-time code generation while effectively future-proofing enterprise deployments against evolving native ecosystem standards.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Comparative Analysis
&lt;/h2&gt;

&lt;p&gt;The table below synthesizes the structural and functional nuances of the four original methodologies alongside our two newly proposed frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Discussion and Conclusion
&lt;/h2&gt;

&lt;p&gt;The evolution of Google Workspace automation reflects a broader industry shift toward "Agentic" design. Hawksey’s foundational work establishes how legacy scripts can be seamlessly bridged to LLMs, preempting Google's recent Next '26 announcement of the official Agent Skills repository. Mcpherson's integration of the &lt;code&gt;gas-fakes&lt;/code&gt; emulation layer combined with the Model Context Protocol (MCP) redefines enterprise security and interoperability. Rather than simply isolating execution, Mcpherson demonstrates how advanced sandboxing, managed via mechanisms like keyless workload identity federation, enables multi-cloud versatility (supporting sovereign clouds) and cross-platform backend integration (e.g., Microsoft OneDrive). Furthermore, powered by continuous learning from over 10,000 tests, this approach proves that agents can reliably self-augment through cyclical feedback. Tanaike pushes the envelope of scalability; his dual approaches to Tool Space Interference (TSI) demonstrate that AI systems must either possess the autonomy to self-generate capabilities or the communicative protocols (A2A) to delegate them.&lt;/p&gt;

&lt;p&gt;The proposed frameworks, Federated CARA and SOTCN, represent the logical next steps in this evolution, deeply aligning with the latest native Google Cloud capabilities. In particular, the SOTCN methodology provides a powerful extension to Google's newly introduced Agent Skills paradigm by offering a semantic caching mechanism that programmatically eliminates context bloat while standardizing tool invocation. By optimizing for latency, context window preservation, multi-cloud versatility, and strict data compliance, these advanced methodologies outline a robust, future-proof roadmap for deploying enterprise-grade, autonomous AI coworkers that are as reliable as they are dynamic.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evolution of Automation:&lt;/strong&gt; Workspace automation has officially shifted from rigid Google Apps Script macros to the "Agentic Era," powered by the new Gemini Enterprise Agent Platform, which natively supports the building, scaling, and governance of autonomous agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill Chaining (Hawksey):&lt;/strong&gt; Utilizes the Workspace CLI to combine local Node.js execution with remote GAS deployment. This foundational approach preempted Google's newly announced &lt;em&gt;Agent Skills&lt;/em&gt; repository, which formalizes skills as compact, agent-first documentation to mitigate context bloat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Sandboxing (Mcpherson):&lt;/strong&gt; Employs the &lt;code&gt;gas-fakes&lt;/code&gt; emulation layer and MCP to securely execute LLM-generated code locally or containerized in any cloud platform. This focus on secure isolation and standardized tooling is now mirrored at the enterprise level by Google's native &lt;em&gt;MCP&lt;/em&gt; support and the highly secure &lt;em&gt;GKE Agent Sandbox&lt;/em&gt; utilizing gVisor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Generation (Tanaike):&lt;/strong&gt; Utilizes a sophisticated 5-agent orchestrator to write, sandbox-test, and cloud-deploy custom tools in real-time, effectively solving Tool Space Interference (TSI). This mirrors the capabilities of Google's newly announced &lt;em&gt;Long-running agents&lt;/em&gt;, which autonomously execute multi-step workflows inside secure cloud sandboxes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote Delegation (Tanaike):&lt;/strong&gt; Leverages the A2A protocol to offload massive libraries of over 160 predefined skills to remote GAS Web Apps. This community-built solution directly anticipated Google's native &lt;em&gt;A2A (Agent-to-Agent) Orchestration&lt;/em&gt; and &lt;em&gt;Agent Registry&lt;/em&gt;, which now allow enterprise agents to natively discover and delegate tasks to one another.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proposed Method 1 (Federated CARA):&lt;/strong&gt; A zero-trust architecture that routes AI tasks to either containerized emulation or remote A2A subagents based on the data's security classification. This framework is perfectly positioned to integrate with the newly announced &lt;em&gt;Agent Gateway&lt;/em&gt; and &lt;em&gt;Agent Identity&lt;/em&gt;, natively enforcing IAM access control policies and guarding against data exfiltration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proposed Method 2 (SOTCN):&lt;/strong&gt; A semantic caching network that stores predefined tools in "cold storage" and dynamically injects only the most relevant functions into the agent's active context window. This architecture enhances Google's new &lt;em&gt;Agent Skills&lt;/em&gt; paradigm by programmatically eliminating context bloat without the latency of generative code writing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  7. Explanatory video
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/27bZ1aXwfFQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googleworkspace</category>
      <category>googleappsscript</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Multi-Agent A2A with the Agent Development Kit(ADK), AWS Lambda, and Gemini CLI</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Sat, 25 Apr 2026 22:16:28 +0000</pubDate>
      <link>https://forem.com/gde/multi-agent-a2a-with-the-agent-development-kitadk-aws-lambda-and-gemini-cli-25ok</link>
      <guid>https://forem.com/gde/multi-agent-a2a-with-the-agent-development-kitadk-aws-lambda-and-gemini-cli-25ok</guid>
      <description>&lt;p&gt;Leveraging the Google Agent Development Kit (ADK) and the underlying Gemini LLM to build Multi-Agent Applications with A2A protocol support using the Python programming language deployed to AWS Lambda.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgdtt0hyrg9041coa25d.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgdtt0hyrg9041coa25d.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Aren’t There a Billion Python ADK Demos?
&lt;/h4&gt;

&lt;p&gt;Yes there are.&lt;/p&gt;

&lt;p&gt;Python has traditionally been the main coding language for ML and AI tools. The goal of this article is to provide a multi-agent test bed for building, debugging, and deploying multi-agent applications.&lt;/p&gt;

&lt;h4&gt;
  
  
  Say It Ain’t So
&lt;/h4&gt;

&lt;p&gt;So what is different about this lab compared to all the others out there?&lt;/p&gt;

&lt;p&gt;This is one of the first deep dives into a Multi-Agent application leveraging the advanced tooling of Gemini CLI. The starting point for the demo was an existing Codelab- which was updated and re-engineered with Gemini CLI.&lt;/p&gt;

&lt;p&gt;The original Codelab- is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codelabs.developers.google.com/codelabs/production-ready-ai-roadshow/1-building-a-multi-agent-system/building-a-multi-agent-system#0" rel="noopener noreferrer"&gt;Building a Multi-Agent System | Google Codelabs&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python Version Management
&lt;/h4&gt;

&lt;p&gt;One of the downsides of the wide deployment of Python has been managing the language versions across platforms and maintaining a supported version.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;pyenv&lt;/strong&gt; tool enables deploying consistent versions of Python:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pyenv/pyenv" rel="noopener noreferrer"&gt;GitHub - pyenv/pyenv: Simple Python version management&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As of writing — the mainstream python version is 3.13. To validate your current Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;python --version
Python 3.13.13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Amazon Lambda
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.serverless.com/aws-lambda" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; is a serverless, event-driven compute service that enables users to run code without provisioning or managing servers. With Lambda, developers can focus solely on their code (functions), while AWS handles all underlying infrastructure management, including capacity provisioning, automatic scaling, and operating system maintenance.&lt;/p&gt;

&lt;p&gt;Full details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/pm/lambda" rel="noopener noreferrer"&gt;Serverless Computing Service - Free AWS Lambda - AWS&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npm install -g @google/gemini-cli
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;▝▜▄ Gemini CLI v0.33.1
    ▝▜▄
   ▗▟▀ Logged in with Google /auth
  ▝▀ Gemini Code Assist Standard /upgrade no sandbox (see /docs) /model Auto (Gemini 3) | 239.8 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node Version Management
&lt;/h4&gt;

&lt;p&gt;Gemini CLI needs a consistent, up to date version of Node. The &lt;strong&gt;nvm&lt;/strong&gt; command can be used to get a standard Node environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nvm-sh/nvm" rel="noopener noreferrer"&gt;GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Development Kit
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://www.google.com/search?q=Google+Agent+Development+Kit&amp;amp;rlz=1CAIWTJ_enUS1114&amp;amp;oq=what+is+the+adk+google&amp;amp;gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIICAEQABgWGB4yCAgCEAAYFhgeMggIAxAAGBYYHjIICAQQABgWGB4yCAgFEAAYFhgeMggIBhAAGBYYHjIKCAcQABgKGBYYHjINCAgQABiGAxiABBiKBTIKCAkQABiABBiiBNIBCDMxODlqMGo3qAIAsAIA&amp;amp;sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;mstk=AUtExfB5Oo7ZHHcDEHu7aqZiPBA2l1c-QGh5dB7xkkDPIiYcn8O1Imt2IHNR7bzA6JnyDCSDCUGpGWTeBW14namlN_QqzJLLI5-px1BE9jfSxwli6njPDPERjm5pRqNP3uC6HhUKiRcTJ1T8x5LHQrCkVxylw7QWg0N8B4dQDIcWpnVX9Gc&amp;amp;csui=3&amp;amp;ved=2ahUKEwjYu-G8p-uSAxXrv4kEHUbpLo0QgK4QegQIARAB" rel="noopener noreferrer"&gt;Google Agent Development Kit&lt;/a&gt; (ADK) is an open-source, Python-based framework designed to streamline the creation, deployment, and orchestration of sophisticated, multi-agent AI systems. It treats agent development like software engineering, offering modularity, state management, and built-in tools (like Google Search) to build autonomous agents.&lt;/p&gt;

&lt;p&gt;The ADK can be installed from here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Skills
&lt;/h4&gt;

&lt;p&gt;Gemini CLI can be customized to work with ADK agents. Both an Agent Development MCP server, and specific Agent skills are available.&lt;/p&gt;

&lt;p&gt;More details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://adk.dev/tutorials/coding-with-ai/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To get the Agent Skills in Gemini CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/skills list
&lt;span class="go"&gt;Available Agent Skills:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the ADK documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/mcp list
&lt;span class="go"&gt;Configured MCP servers:
🟢 adk-docs-mcp (from adk-docs-ext) - Ready (2 tools)
  Tools:
  - mcp_adk-docs-mcp_fetch_docs
  - mcp_adk-docs-mcp_list_doc_sources
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for starting multi agent development is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.&lt;/p&gt;

&lt;p&gt;Then, ADK Multi-Agent is built, debugged, and tested locally. Finally — the entire solution is deployed to AWS Lambda.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Python environment and a working Gemini CLI installation. All of the relevant code examples and documentation is available in GitHub.&lt;/p&gt;

&lt;p&gt;The next step is to clone the GitHub repository to your local environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cd ~
git clone https://github.com/xbill9/gemini-cli-aws
cd multi-lambda
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init2.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;source init2.sh
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;source set_env.sh
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the &lt;strong&gt;set_env&lt;/strong&gt; script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;p&gt;Login to the AWS console:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;aws login --remote
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally install the packages and dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="err"&gt;make&lt;/span&gt; &lt;span class="err"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Verify The ADK Installation
&lt;/h4&gt;

&lt;p&gt;To verify the setup, run the ADK CLI locally with the researcher agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda/agents$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;adk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;run&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;researcher&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;/home/xbill/.pyenv/versions/&lt;/span&gt;&lt;span class="mf"&gt;3.13&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="err"&gt;/lib/python&lt;/span&gt;&lt;span class="mf"&gt;3.13&lt;/span&gt;&lt;span class="err"&gt;/site-packages/authlib/_joserfc_helpers.py:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AuthlibDeprecationWarning:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;authlib.jose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;deprecated,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;please&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;joserfc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;instead.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;It&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;will&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;be&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;compatible&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;before&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;authlib.jose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ECKey&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;/home/xbill/.pyenv/versions/&lt;/span&gt;&lt;span class="mf"&gt;3.13&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="err"&gt;/lib/python&lt;/span&gt;&lt;span class="mf"&gt;3.13&lt;/span&gt;&lt;span class="err"&gt;/site-packages/google/adk/features/_feature_decorator.py:&lt;/span&gt;&lt;span class="mi"&gt;72&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;UserWarning:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;EXPERIMENTAL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;FeatureName.PLUGGABLE_AUTH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;enabled.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;check_feature_enabled()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Log&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;setup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;complete:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/tmp/agents_log/agent.&lt;/span&gt;&lt;span class="mi"&gt;20260422&lt;/span&gt;&lt;span class="err"&gt;_&lt;/span&gt;&lt;span class="mi"&gt;134822&lt;/span&gt;&lt;span class="err"&gt;.log&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;To&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;access&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;latest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;log:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tail&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;-F&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/tmp/agents_log/agent.latest.log&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,011"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"root"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Logging initialized for researcher"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"logging_config.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;54&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"researcher"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"log_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,013"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"researcher.agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Initialized researcher agent with model: gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,015"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_adk.google.adk.cli.utils.envs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Loaded .env file for researcher at /home/xbill/gemini-cli-aws/multi-lambda/.env"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"envs.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;83&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,016"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_adk.google.adk.cli.utils.local_storage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Using per-agent session storage rooted at /home/xbill/gemini-cli-aws/multi-lambda/agents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_storage.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;84&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,016"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_adk.google.adk.cli.utils.local_storage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Using file artifact service at /home/xbill/gemini-cli-aws/multi-lambda/agents/researcher/.adk/artifacts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_storage.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;110&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,017"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_adk.google.adk.cli.utils.service_factory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Using in-memory memory service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"service_factory.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;266&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"asctime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-22 13:48:23,047"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_adk.google.adk.cli.utils.local_storage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"levelname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local session service at /home/xbill/gemini-cli-aws/multi-lambda/agents/researcher/.adk/session.db"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"filename"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_storage.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lineno"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Running&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;researcher,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exit.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Test The ADK Web Interface
&lt;/h4&gt;

&lt;p&gt;This tests the ADK agent interactions with a browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda/agents$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk web &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0
&lt;span class="go"&gt;/home/xbill/.local/lib/python3.13/site-packages/google/adk/features/_feature_decorator.py:72: UserWarning: [EXPERIMENTAL] feature FeatureName.PLUGGABLE_AUTH is enabled.
  check_feature_enabled()
2026-04-12 16:43:14,152 - INFO - service_factory.py:266 - Using in-memory memory service
2026-04-12 16:43:14,153 - INFO - local_storage.py:84 - Using per-agent session storage rooted at /home/xbill/gemini-cli-aws/multi-eks/agents
2026-04-12 16:43:14,153 - INFO - local_storage.py:110 - Using file artifact service at /home/xbill/gemini-cli-aws/multi-eks/agents/.adk/artifacts
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/fast_api.py:198: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
INFO: Started server process [32675]
INFO: Waiting for application startup.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use the web interface — either on the local interface &lt;strong&gt;127.0.0.1&lt;/strong&gt; or the catch-all web interface &lt;strong&gt;0.0.0.0&lt;/strong&gt; -depending on your environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdsixkis3hdhngrjbooa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdsixkis3hdhngrjbooa.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Special note for Google Cloud Shell Deployments- add a CORS &lt;strong&gt;allow_origins&lt;/strong&gt; configuration exemption to allow the ADK agent to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;adk web --host 0.0.0.0 --allow_origins 'regex:.*'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Multi Agent Design
&lt;/h4&gt;

&lt;p&gt;The multi-agent deployment consists of 5 agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Researcher&lt;/li&gt;
&lt;li&gt;Judge&lt;/li&gt;
&lt;li&gt;Orchestrator&lt;/li&gt;
&lt;li&gt;Content Builder&lt;/li&gt;
&lt;li&gt;Course Builder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a detailed analysis of the multi-agent architecture- this article provides the background information:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://xbill999.medium.com/multi-agent-a2a-with-the-agent-development-kit-adk-cloud-run-and-gemini-cli-52f8be838ad6" rel="noopener noreferrer"&gt;Multi-Agent A2A with the Agent Development Kit(ADK), Cloud Run, and Gemini CLI&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Running/Testing/Debugging Locally
&lt;/h4&gt;

&lt;p&gt;The main Makefile has been extended with extensive targets for managing the agents on the local development environment.&lt;/p&gt;

&lt;p&gt;The key targets include:&lt;/p&gt;

&lt;p&gt;First check for local running agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make status
&lt;span class="go"&gt;Checking status of locally running agents and servers...
--- Network Status ---
No services listening on expected ports (8000-8004, 5173).
--- Process Status ---
No matching processes found.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then all the agents can be started together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ The frontend has been built and all local services (Researcher, Judge, Content Builder, Orchestrator, App Backend, and Frontend dev server) have
  been started in the background.

   - Frontend: [http://localhost:5173](http://localhost:5173) (http://localhost:5173)
   - Backend: [http://localhost:8000](http://localhost:8000) (http://localhost:8000)

  Logs for each service are available in the project root: researcher.log, judge.log, content_builder.log, orchestrator.log, backend.log, and
  frontend.log.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect to the front end locally:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eui8i77oqdiftnerk4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eui8i77oqdiftnerk4r.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the entire agent system will run in the local environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklkygh1ukveelo7tr9uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklkygh1ukveelo7tr9uw.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Local Logging / Debugging
&lt;/h4&gt;

&lt;p&gt;Gemini CLI has full access to the local agent logs for debugging and troubleshooting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;review the &lt;span class="nb"&gt;local &lt;/span&gt;logs &lt;span class="k"&gt;for &lt;/span&gt;the multi-agent run
&lt;span class="go"&gt;✦ I will check run_local.sh and the project structure to determine where logs are stored or how they are generated.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will verify the local operation of the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ The local logs confirm a successful multi-agent run for the topic "the history of the internet." Here is a summary of the execution flow:

* Orchestrator (orchestrator.log): Successfully managed the pipeline. It captured the topic, coordinated the Research-Judge loop, and escalated to
     content generation after the first iteration.
   * Researcher (researcher.log): Gathered approximately 9.2KB of research data using the google_search tool. It completed its task with a 200 OK
     response.
   * Judge (judge.log): Evaluated the findings and gave a "pass" status. The judge noted that the research was "comprehensive, accurate, and
     well-structured," making it an "excellent foundation for educational content."
   * Content Builder (content_builder.log): Successfully transformed the validated research into a structured Markdown course module and finished
     execution.
   * Backend &amp;amp; Frontend (backend.log, frontend.log): The FastAPI backend (port 8000) and Vite frontend (port 5173) are running and communicating with
     the orchestrator as expected.

Notable Observation:
   * A minor warning appeared in researcher.log: Researcher callback: 'response' not found or has no parts. However, this did not impact the
     successful completion of the research or the overall pipeline.

All services are currently active and logged at the DEBUG level as specified in run_local.sh.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Deploying to Amazon Lambda
&lt;/h4&gt;

&lt;p&gt;The project level Makefile has targets for managing the Agent deployment to serverless endpoints. You can submit the build for Lamba deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make deploy
&lt;span class="go"&gt;chmod +x lambda/deploy-lambda.sh
./lambda/deploy-lambda.sh
Ensuring IAM role McpLambdaExecutionRole exists...
Logging in to Amazon ECR...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the containers are deployed- you can then get the endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make endpoint
&lt;span class="go"&gt;https://wqv5reqmno6skv3xsqb64kgrsm0hletn.lambda-url.us-east-1.on.aws/

&lt;/span&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make status
&lt;span class="go"&gt;Course Creator Lambda Status:
-----------------------------------------------------
---------------------------------------------
| GetFunction |
+---------------------------------+---------+
| Name | Status |
+---------------------------------+---------+
| course-creator-course-builder | Active |
+---------------------------------+---------+
URL: https://wqv5reqmno6skv3xsqb64kgrsm0hletn.lambda-url.us-east-1.on.aws/
-----------------------------------------------------
-------------------------------------------
| GetFunction |
+-------------------------------+---------+
| Name | Status |
+-------------------------------+---------+
| course-creator-orchestrator | Active |
+-------------------------------+---------+
URL: https://q5bciiujjktr6wris6tple6fra0yyrqc.lambda-url.us-east-1.on.aws/
-----------------------------------------------------
-----------------------------------------
| GetFunction |
+----------------------------+----------+
| Name | Status |
+----------------------------+----------+
| course-creator-researcher | Active |
+----------------------------+----------+
URL: https://gfhdoxhiiznflcz2cdhc65z2eq0cwimd.lambda-url.us-east-1.on.aws/
-----------------------------------------------------
------------------------------------
| GetFunction |
+-----------------------+----------+
| Name | Status |
+-----------------------+----------+
| course-creator-judge | Active |
+-----------------------+----------+
URL: https://kaen6rupkl5ph5kde2g6h7wgr40sirch.lambda-url.us-east-1.on.aws/
-----------------------------------------------------
----------------------------------------------
| GetFunction |
+----------------------------------+---------+
| Name | Status |
+----------------------------------+---------+
| course-creator-content-builder | Active |
+----------------------------------+---------+
URL: https://k5wt4o6vrdao3w4zjiabszdeue0kauxp.lambda-url.us-east-1.on.aws/
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The service will be visible in the AWS console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs68oej1hzolyuxu3ltz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs68oej1hzolyuxu3ltz.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the entire system can be tested:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/multi-lambda$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make e2e-test-lambda
&lt;span class="go"&gt;Fetching Lambda endpoint...
make[1]: Entering directory '/home/xbill/gemini-cli-aws/multi-lambda'
Running end-to-end test against https://wqv5reqmno6skv3xsqb64kgrsm0hletn.lambda-url.us-east-1.on.aws/...
Temporary JSON file content: {"message": "Create a short course about the history of the internet", "user_id": "e2e_test_user"}
Executing: curl -s -X POST https://wqv5reqmno6skv3xsqb64kgrsm0hletn.lambda-url.us-east-1.on.aws/api/chat_stream -H "Content-Type: application/json" -d @/tmp/tmp.vmxl0Dsf88 --no-buffer
{"type": "progress", "text": "\ud83d\ude80 Connected to backend, starting research..."}
{"type": "progress", "text": "\ud83d\ude80 Starting the course creation pipeline..."}
{"type": "progress", "text": "\ud83d\udd0d Research is starting..."}
{"type": "progress", "text": "\ud83d\udd0d Researcher is gathering information..."}
{"type": "progress", "text": "\u2696\ufe0f Judge is evaluating findings..."}
{"type": "progress", "text": "\u2696\ufe0f Judge is evaluating findings..."}
{"type": "progress", "text": "\u270d\ufe0f Building the final course content..."}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Running the Web Interface
&lt;/h4&gt;

&lt;p&gt;Start a connection to the deployed app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://wqv5reqmno6skv3xsqb64kgrsm0hletn.lambda-url.us-east-1.on.aws/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect to the app :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8x3ewby1onl8r03adirm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8x3ewby1onl8r03adirm.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then use online course generator:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduba7v2df0ovz2kxqfz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduba7v2df0ovz2kxqfz7.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Final Gemini CLI Code Review
&lt;/h4&gt;

&lt;p&gt;As a final step — Gemini CLI was used for a full code review of the project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ The multi-lambda project is a well-engineered, distributed multi-agent system that effectively leverages the
  Google Agent Development Kit (ADK) and the A2A protocol within an AWS Lambda environment. The architecture is
  modular, resilient, and optimized for streaming AI interactions.

  Key Strengths

   * Robust Orchestration: The use of SequentialAgent and LoopAgent in agents/orchestrator/agent.py creates a
     sophisticated Research-Judge pipeline. Custom components like StateCapturer and EscalationChecker provide
     fine-grained control over the agent loop and state persistence.
   * Infrastructure-Aware A2A: The middleware in shared/a2a_utils.py dynamically rewrites agent card URLs based on
     x-forwarded-* headers, a critical feature for services running behind proxies or as Lambda Function URLs.
   * Intelligent Agent Callbacks: Each sub-agent (researcher, judge, content_builder) utilizes
     before_agent_callback to sanitize inputs and recover topics/findings from session history. This makes the
     agents highly resilient to the "noise" of accumulated conversation history.
   * Unified Deployment: The project employs a single Docker image (using the aws-lambda-adapter) for all
     services. The deploy-lambda.sh script automates the complex task of deploying five distinct services,
     managing their Function URLs, and wiring them together via environment variables.
   * Modern Python Stack: The use of Python 3.13 and uvicorn ensures high performance, while the gateway's
     merge_strings logic gracefully handles the complexities of streaming deduplication.

  Observations &amp;amp; Recommendations

   * Session Persistence: Currently, the system appears to use ADK's default in-memory session storage. In a
     Lambda environment, sessions may be lost across cold starts or scaling events. For production use,
     integrating a persistent session store (e.g., DynamoDB or Redis) via the session_service_uri option is
     recommended.
   * Security: Inter-agent communication currently bypasses IAM authentication (using NONE auth type) and relies
     on the authenticated_httpx.py bypass logic. While acceptable for a demo, production environments should
     leverage AWS IAM or Google Identity Tokens for service-to-service authorization.
   * State Management: The orchestrator's StateCapturer effectively "syncs" findings from the event stream into
     the session state. This is a clever approach to distributed state management, ensuring the Orchestrator
     remains the source of truth even when sub-agents are remote.

  Overall, this is a sophisticated implementation that successfully bridges Google's AI agent framework with AWS's
  serverless infrastructure.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;The Agent Development Kit (ADK) was used to build a multi-agent system with A2A support using the Gemini Flash LLM Model. This application was tested locally with Gemini CLI and then deployed to AWS Lambda. Finally, Gemini CLI was used for a complete project code review.&lt;/p&gt;

</description>
      <category>googleadk</category>
      <category>lambda</category>
      <category>aws</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Instructions. Skills. Tools. How Google Embedded Skills Into Every Layer of Its Agent Stack</title>
      <dc:creator>Sonika Janagill</dc:creator>
      <pubDate>Sat, 25 Apr 2026 18:30:55 +0000</pubDate>
      <link>https://forem.com/gde/instructions-skills-tools-how-google-embedded-skills-into-every-layer-of-its-agent-stack-5415</link>
      <guid>https://forem.com/gde/instructions-skills-tools-how-google-embedded-skills-into-every-layer-of-its-agent-stack-5415</guid>
      <description>&lt;h2&gt;
  
  
  Agent Skills Adoption
&lt;/h2&gt;

&lt;p&gt;&lt;span&gt; &lt;em&gt;Prompt Bloat has a name and a fix. Skills are now load-bearing across Google’s agent stack: from on-device Gemma 4 to enterprise Gemini, from coding assistants to the official Cloud repository&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkogb5p0p6obphfmbq3l6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkogb5p0p6obphfmbq3l6.png" alt="Instructions. Skills. Tools." width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It&lt;/strong&gt; usually starts with good intentions.&lt;/p&gt;

&lt;p&gt;A team builds an agent. It works, mostly, until it misses a naming convention or ignores an approval workflow. So, you add a paragraph to the system prompt. Then another to handle an edge case. Then three more for stakeholder rules.&lt;/p&gt;

&lt;p&gt;Six months in, the prompt is a 4,000-word monolith. Nobody knows what is still relevant, but everyone is afraid to touch it. The agent is now slower and less reliable than when it had 200 words of instructions. Every “fix” risks a regression.&lt;/p&gt;

&lt;p&gt;This is the reality of &lt;strong&gt;Prompt Bloat&lt;/strong&gt; : the silent technical debt of enterprise AI.&lt;/p&gt;

&lt;p&gt;This has been the enterprise agent bottleneck for two years. I recently spoke with a practitioner managing 100+ production skills; they described a marketing auditor that loaded 15,000 tokens of instructions on every invocation. It left almost no context window for the actual content being audited. The agent “worked,” but it was drowning in its own instructions. The output was mediocre because the reasoning tax was too high.&lt;/p&gt;

&lt;p&gt;At Google Cloud Next ’26, Google productized the solution: &lt;strong&gt;Skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core thesis is that Skills are the “settled” abstraction for agentic workflows. They occupy the vital middle ground:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Better than Prompts:&lt;/strong&gt; Because they are reusable and persistent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lighter than Fine-tuning:&lt;/strong&gt; Because they iterate at the speed of business logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smarter than RAG:&lt;/strong&gt; Because they are active expertise, not just passive retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Richer than Tools:&lt;/strong&gt; Because they encode “how” and “why,” not just “do.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skills are small, named, dynamically loaded units of expertise. With Google shipping them across three distinct surfaces, the industry debate over what to call this pattern is over. The real question begins: who is responsible for governing yours?&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pattern: How Google Embeds Open Abstractions
&lt;/h3&gt;

&lt;p&gt;Google’s shipping strategy follows a consistent &lt;strong&gt;“Adoption Flywheel”&lt;/strong&gt; : observe the abstractions the developer community is independently building, adopt the open standard, and embed it as a first-class primitive across the stack.&lt;/p&gt;

&lt;p&gt;Recognising this pattern tells you exactly where to invest your time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP.&lt;/strong&gt; Anthropic released the Model Context Protocol as a lightweight standard for connecting agents to external tools and data sources. Google’s response was not to build a competing standard. Within months, managed MCP servers were shipping for Cloud Run, BigQuery, AlloyDB, Cloud SQL, and the full Workspace suite. Google adopted the standard and built infrastructure around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A2A.&lt;/strong&gt; Google co-authored the Agent-to-Agent protocol for cross-agent communication, then handed governance to the Linux Foundation’s Agentic AI Foundation rather than keeping it proprietary. It now has 150 organisations in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills.&lt;/strong&gt; The ecosystem independently discovered that agents need loadable expertise. Google productized it, kept the open &lt;code&gt;agentskills.io&lt;/code&gt; name, and moved it from a “sidebar feature” to “load-bearing” infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practical implication:&lt;/strong&gt; When Google adopts an open abstraction, the format stabilises, but the complexity shifts. You can stop worrying about the file format and start worrying about the governance. Invest in the abstraction, not the vendor-specific implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikingdoamvv1o28qpswo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikingdoamvv1o28qpswo.png" alt="Google mainlines open ecosystem abstractions into load-bearing infrastructure" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Surfaces Where Skills Have Now Shipped
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/whats-new-in-gemini-enterprise" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt;: Skills as a First-Class Product Feature
&lt;/h4&gt;

&lt;p&gt;The announcement of Skills inside the Gemini Enterprise marks a shift from “Linear Context Loading” to “Dynamic Skill Dispatching”.&lt;/p&gt;

&lt;p&gt;The technical cost of large system prompts is the “Lost in the Middle” phenomenon. When irrelevant instructions saturate the context window, reasoning degrades. The model spends so much of its “cognitive overhead” parsing the prompt that it has little capacity left for the actual task.&lt;/p&gt;

&lt;p&gt;Skills solve this via &lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview#how-skills-work" rel="noopener noreferrer"&gt;&lt;strong&gt;Progressive Disclosure&lt;/strong&gt;&lt;/a&gt; in three stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Discovery:&lt;/strong&gt; The agent knows the skill exists via a minimal metadata footprint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Activation:&lt;/strong&gt; The full instructions load only when the task requires that specific expertise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution:&lt;/strong&gt; The agent follows the structured Markdown and templates to complete the work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xwydiut27eqbqay4y4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xwydiut27eqbqay4y4b.png" alt="Agent Skills: Progressive Disclosure in Three Stages" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By preserving the &lt;strong&gt;reasoning budget&lt;/strong&gt; for the task rather than the instructions, you get the breadth of a deeply specialised agent without the context tax on every invocation.&lt;/p&gt;

&lt;p&gt;For enterprise teams, Skills are not a standalone feature; they are part of a coherent operating model. They sit alongside &lt;strong&gt;Agent Designer&lt;/strong&gt; , secure execution sandboxes, and a central &lt;strong&gt;Inbox&lt;/strong&gt; for monitoring activity. This is Google providing the infrastructure to manage agents at an organisational scale, rather than just building better chatbots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwah03renkvgkb2dtk6p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkwah03renkvgkb2dtk6p.png" alt="Gemini Enterprise Skills for reusable actions" width="800" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;a href="https://developers.googleblog.com/agents-cli-in-agent-platform-create-to-production-in-one-cli/" rel="noopener noreferrer"&gt;Agents CLI&lt;/a&gt;: Skills for Your Coding Assistant
&lt;/h4&gt;

&lt;p&gt;The second surface is where the engineering actually happens: the terminal and their coding assistant. Polong Lin, Google’s Staff DevRel Manager for ADK, has positioned the Agents CLI as the bridge between a cool demo and a production-ready AI workforce. It is pre-GA and available now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Preferred: uvx handles an ephemeral environment&lt;/span&gt;
uvx google-agents-cli setup 

&lt;span class="c"&gt;# Alternative: install specific skills &lt;/span&gt;
npx skills add google/agents-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Agents CLI turns assistants like Claude Code or Gemini CLI into ADK specialists. At launch, seven “Workflow Skills” ship out of the box to handle the end-to-end development lifecycle:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpyqjqw3udq0t3ds71l6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpyqjqw3udq0t3ds71l6.png" alt="Agents CLI Skills" width="800" height="518"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What this means in practice: when you invoke google-agents-cli-scaffold, inside Claude Code, your coding assistant loads a skill that carries Google's conventions for ADK project structure, component naming, and integration patterns. It does not need to guess or hallucinate ADK-specific idioms. The expertise is encoded in the skill. The skills work immediately.&lt;/p&gt;

&lt;p&gt;What takes longer is discipline: knowing when to write a custom skill versus when to extend a system prompt, and agreeing on that line across your team.&lt;/p&gt;

&lt;p&gt;The real breakthrough, however, is the Official Agent Skills Repository: &lt;a href="https://github.com/google/skills" rel="noopener noreferrer"&gt;github.com/google/skills&lt;/a&gt;. Thirteen skills at launch, covering the most-used Google Cloud products and architectural concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Product skills:&lt;/strong&gt; AlloyDB, BigQuery, Cloud Run, Cloud SQL, Firebase, Gemini API, GKE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Well-Architected Pillar skills:&lt;/strong&gt; Security, Reliability, Cost Optimisation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recipe skills:&lt;/strong&gt; Authentication, Onboarding, Network Observability
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills &lt;span class="nb"&gt;install &lt;/span&gt;github.com/google/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are agent-first documentation: compact, grounded expertise written for agents to consume, not humans to read. Accurate terminal commands. No hallucinated API calls. No outdated SDK syntax. The Well-Architected Pillar skills are particularly notable: they encode Google’s architectural judgement as loadable expertise, not a 200-page PDF that nobody reads.&lt;/p&gt;

&lt;p&gt;The third surface is the most unexpected, and the most revealing about where this is heading.&lt;/p&gt;

&lt;p&gt;Google AI Edge Gallery, available on iOS and Android, allows you to build and experiment with AI experiences that run entirely on-device. At Next ’26, Google announced the launch of Agent Skills: one of the first applications to run multi-step, autonomous agentic workflows entirely on-device. Powered by Gemma 4, Agent Skills can augment the knowledge base, enabling Gemma 4 to access information beyond its initial training data using skills.&lt;/p&gt;

&lt;p&gt;The Gemma 4 edge variants (E2B and E4B) run under 1.5GB of RAM on mid-range to flagship devices. The LiteRT-LM runtime processes 4,000 tokens across two Agent Skills in under three seconds. The model decides autonomously which of its available tools to invoke, in which sequence, and composes the response entirely on-device.&lt;/p&gt;

&lt;p&gt;The critical detail here is the format. The skill powering the Gallery is not a proprietary Google file, it is the SKILL.md format from &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This creates a massive architectural implication for the enterprise. You can build a custom skill on a phone, test it offline, and deploy the exact same file to a cloud-hosted Gemini 3.1 instance on Vertex AI. The Skill has become the portable container for cognition: &lt;strong&gt;“Docker for Prompts.”&lt;/strong&gt; No other stack offers that path right now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6dey3u44np8f97h3bqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6dey3u44np8f97h3bqg.png" alt="Skills: Docker for Prompts" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Convergence: This Is Not Coincidence
&lt;/h3&gt;

&lt;p&gt;Three surfaces. Three implementations of the same abstraction. And the underlying format is converging on something that started at Anthropic. When you see the same abstraction ship across a web app, a CLI tool, and a mobile runtime simultaneously, it is no longer a “feature.” It is a protocol.&lt;/p&gt;

&lt;p&gt;The Day 2 developer keynote demo built a planning agent using ADK, MCP servers, and Agent Runtime, and described what the agent needed in three words: &lt;strong&gt;Instructions, Skills, and Tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.cloud.google.com/agent-registry/overview" rel="noopener noreferrer"&gt;Agent Registry&lt;/a&gt; reinforces this. Agent Registry maintains a central library of approved tools, indexing every internal agent, tool, and skill. That is governance infrastructure, not just a catalogue. When skills are indexed by Agent Registry, the “which skill was loaded?” accountability question I raised earlier has a concrete answer at the platform level.&lt;/p&gt;

&lt;p&gt;It also helps to see where Skills sit relative to the other layers of the 2026 agent stack:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwkao8i0hqdmx84n2mta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwkao8i0hqdmx84n2mta.png" alt="Where Skills sit relative to the other layers of the 2026 agent stack" width="800" height="366"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Skills and other layers of the agent stack&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Each layer solves a different problem. The mistake most enterprise teams make is trying to solve the Skills (logic and process) problem with more RAG (more data). Google’s implementation across these three surfaces forces a much-needed discipline: keep your tools mechanical, your data accessible, and your expertise modular.&lt;/p&gt;

&lt;p&gt;This is what protocol convergence looks like before the formal standard exists. The ecosystem finds the right shape. Then the spec follows. MCP went through this in 2024. A2A went through this in 2025. Skills are going through it now.&lt;/p&gt;

&lt;p&gt;The practical takeaway: invest in the abstraction regardless of which vendor surface you build on first. The format will stabilise. The Skills catalogue you build this year will not be obsolete when the spec lands.&lt;/p&gt;

&lt;p&gt;I wrote about the governance side of this challenge before Google named it, in &lt;a href="https://sonikajanagill.com/articles/enterprise-agent-skills-governance/" rel="noopener noreferrer"&gt;“The Skills Explosion Is Here. Enterprise Governance Isn’t.”&lt;/a&gt; The moment I described there, where a developer drops a GitHub link to 100+ community skills and forty reaction emojis appear in Slack, arrives faster when three surfaces of Google’s stack ship Skills simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Enterprise Reality
&lt;/h3&gt;

&lt;p&gt;For the past year, our core challenge hasn’t been selecting models or frameworks. It has been: &lt;strong&gt;How do we make individual experimentation compatible with organisational standards?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tension is genuine. A developer working on a client campaign in Berlin has domain context that a platform team in London cannot anticipate. If skills are locked down centrally, that contextual expertise cannot reach the agent. If skills are entirely uncontrolled, you cannot audit what your agents are doing or ensure quality across client deliverables.&lt;/p&gt;

&lt;p&gt;Google’s architecture addresses this through a Layered Composition Model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Organisation Level&lt;/strong&gt; : Global standards, brand voice, and compliance rules (managed via Gemini Enterprise).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project Level&lt;/strong&gt; : Client-specific conventions and workflow patterns (managed via Agent Registry).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal Level&lt;/strong&gt; : Individual experimentation and localised hacks (managed via Agents CLI).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stack allows these layers to compose, but it doesn’t yet solve the governance challenge sitting above the architecture. We still have to answer: Which skills are deprecated? Who owns the versioning when the underlying model changes? How do we evaluate a skill’s reliability before it reaches a production agent?&lt;/p&gt;

&lt;p&gt;The infrastructure is here. Now, the governance tooling must catch up to the adoption rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Open Questions for the Post-Launch Reality
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Skills vs. MCP tools: when is each right?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tools are mechanical; Skills are cognitive.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tools are stateless and specific: “Call this API, return this schema.”&lt;/li&gt;
&lt;li&gt;Skills carry instructions, conventions, and internal logic.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Heuristic:&lt;/strong&gt; If it’s a single function call, it’s a tool. If it requires the agent to reason about sequencing, error handling, or escalation, it’s a skill. In agentic commerce, an API call to update a product attribute is a tool. Knowing when an attribute is missing, how to verify its quality, and when to escalate to a human is a  &lt;strong&gt;skill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you version a skill when the underlying model changes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A skill written for Gemini 2.0 may behave differently under Gemini 3.1. The instructions are identical; the model’s interpretation is not. This is the least-solved governance problem in the ecosystem. Treat model upgrades as potential regressions. Use the google-agents-cli-eval skill to run benchmarks against your catalogue before promoting a new model to production. I expect &lt;strong&gt;"Pinned Skills"&lt;/strong&gt; -expertise locked to a validated model version-to become a standard enterprise requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who owns the skill library in your organisation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tempting answer is “the platform team,” but that doesn’t scale. Ownership should follow the domain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Foundational Skills:&lt;/strong&gt; (Formatting, code patterns) belong to the Platform Team.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Skills:&lt;/strong&gt; (Jira conventions, onboarding) belong to Domain Owners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal Skills:&lt;/strong&gt; Belong to the Individual until they are contributed upstream.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Ownership is accountability. When an agent fails, “Which skill was loaded?” needs a traceable answer. Agent Registry (announced at Next ’26) provides the platform-level index, but you must build owner attribution into the skill definition itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The “Hidden” Problem: Skill Collision&lt;/strong&gt; : As skill catalogues exceed 50+ skills, descriptions will inevitably overlap. The agent’s router will pick the wrong one, leading to subtle, high-stakes errors. Forward-looking teams are already building Skill Leaderboards to track success rates across model iterations and catch these collisions before they reach the client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4w32pi0srbanyvsn7s0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4w32pi0srbanyvsn7s0l.png" alt="Skills: Collision Governance" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What to Do Next!
&lt;/h3&gt;

&lt;p&gt;The Skills abstraction is now shipped, named, and available across three Google surfaces. The infrastructure question is largely settled. What remains is the governance question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your team has agents in production,&lt;/strong&gt; Audit the knowledge your agents currently load. If expertise is buried in 4,000-word system prompts with no clear ownership, use Skills to decompose that monolith into maintainable, versioned units. Move from “Prompt Engineering” to “Skill Architecture.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your team is building new agents now:&lt;/strong&gt; Start with the Agents CLI. Use uvx google-agents-cli setup to bootstrap your first ADK agent and explore the bundled workflow skills. Then, install product-specific expertise from the official Agent Skills repository: npx skills install github.com/google/skills Learn the pattern with these "training wheels" before you are tasked with maintaining a production fleet of 40+ custom skills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are thinking about the enterprise governance layer:&lt;/strong&gt; Review my earlier analysis on &lt;a href="https://sonikajanagill.com/articles/enterprise-agent-skills-governance/" rel="noopener noreferrer"&gt;the security and governance challenge&lt;/a&gt;. covers the vulnerability data from January 2026 (one in four public skills contains at least one vulnerability), the three-tier classification model for external skills (Green/Amber/Red), and the progressive disclosure pattern that prevents context from drowning your agents.&lt;/p&gt;

&lt;p&gt;Google’s launch of Agent Registry makes these challenges visible, but it doesn’t solve them for you. The registry provides the index, but your team must provide the policy.&lt;/p&gt;

&lt;p&gt;The governance conversation starts now. Your skills catalogue, and the rigour with which you govern it, will define the quality floor of every agent your team ships.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sonika Janagill is a Google Developer Expert in Cloud AI &amp;amp; Google Cloud, Lead Backend Engineer at VML, and Data/MLOps Engineer at WPP Media. She writes about agentic systems, MLOps, and enterprise AI at&lt;/em&gt; &lt;a href="https://sonikajanagill.com" rel="noopener noreferrer"&gt;&lt;em&gt;sonikajanagill.com&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  References and further reading
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/level-up-your-agents-announcing-googles-official-skills-repository?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Level Up Your Agents: Announcing Google’s Official Skills Repository&lt;/a&gt; — Google Cloud Blog, Megan O’Keefe, 22 April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/google/skills" rel="noopener noreferrer"&gt;github.com/google/skills&lt;/a&gt; — Official Google Agent Skills repository (13 skills at launch)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.googleblog.com/developers-guide-to-building-adk-agents-with-skills/?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Developers guide to building ADK agents with skills&lt;/a&gt; — Google Developers Blog, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/google/agents-cli" rel="noopener noreferrer"&gt;Agents CLI&lt;/a&gt; — ADK workflow skills for coding assistants&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sonikajanagill.com/articles/enterprise-agent-skills-governance/" rel="noopener noreferrer"&gt;The Skills Explosion Is Here. Enterprise Governance Isn’t.&lt;/a&gt; — Sonika Janagill, March 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google AI Edge Gallery — Gemma 4 Agent Skills&lt;/a&gt; — Google Developers Blog, April 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally published at&lt;/em&gt; &lt;a href="https://sonikajanagill.com/articles/google-agent-skills-stack/" rel="noopener noreferrer"&gt;&lt;em&gt;https://sonikajanagill.com&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>googleadk</category>
      <category>googleagentplatform</category>
      <category>enterpriseaistrategy</category>
      <category>geminienterprise</category>
    </item>
    <item>
      <title>Build and Deploy to Google Cloud with Antigravity: The Era of Agent-First Development</title>
      <dc:creator>Gbemisola Esho</dc:creator>
      <pubDate>Fri, 24 Apr 2026 15:26:04 +0000</pubDate>
      <link>https://forem.com/gde/build-and-deploy-to-google-cloud-with-antigravity-the-era-of-agent-first-development-36d0</link>
      <guid>https://forem.com/gde/build-and-deploy-to-google-cloud-with-antigravity-the-era-of-agent-first-development-36d0</guid>
      <description>&lt;p&gt;The landscape of software development is undergoing a seismic shift from simple chat interfaces to autonomous agents capable of planning, executing, and refining complex workflows. Leading this charge is Google Antigravity, an agentic development platform that evolves the traditional IDE into a mission control center for an agent-first era.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oqif620bdel0on3pi1y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oqif620bdel0on3pi1y.png" alt=" " width="800" height="738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unlike standard coding assistants that merely autocomplete lines, Antigravity functions as an autonomous actor that can design, build, and deploy entire systems with minimal human intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mission: An Event-Driven Document Pipeline
&lt;/h2&gt;

&lt;p&gt;To see Antigravity in action, we can look at the creation of a serverless, event-driven document processing pipeline on Google Cloud. The architecture involves:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ya922h86bouf0w3waow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ya922h86bouf0w3waow.png" alt=" " width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ingestion: Files uploaded to a Google Cloud Storage (GCS) bucket.&lt;br&gt;
Trigger: Uploads firing a Pub/Sub message.&lt;br&gt;
Processor: A Cloud Run service (Python/Flask) that extracts metadata and processes files using Gemini on Vertex AI.&lt;br&gt;
Storage: Streaming the results (tags, word counts, filenames) into BigQuery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf73c49pa0jlpyvhwv9p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf73c49pa0jlpyvhwv9p.png" alt=" " width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Development in Antigravity doesn't start with code; it starts with a Mission. In the Agent Manager, developers use the Playground to provide high-level prompts. Antigravity excels at planning complex systems before a single line is written.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48r086hmbt0vex52ji3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48r086hmbt0vex52ji3j.png" alt=" " width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A critical feature is the Review Policy. By setting artifacts to "Asks for Review," you ensure the agent presents its logic for approval before execution, fostering trust and maintaining human-in-the-loop control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm51a37bv0v5op19m64aj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm51a37bv0v5op19m64aj.png" alt=" " width="800" height="713"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: Autonomous Code &amp;amp; Infrastructure Generation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnub2tfs4y92pfit1emc6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnub2tfs4y92pfit1emc6.png" alt=" " width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the plan is approved, Antigravity generates all necessary artifacts:&lt;br&gt;
Infrastructure as Code: A setup.sh script to enable APIs (Cloud Run, Pub/Sub, BigQuery) and provision resources.&lt;br&gt;
Application Code: A Python-based main.py, a Dockerfile, and a requirements.txt.&lt;br&gt;
Deployment: The agent handles building the container image and deploying the Cloud Run service automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgo5s31bdx91uq71yn7hr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgo5s31bdx91uq71yn7hr.png" alt=" " width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Verification via Artifacts, Not Logs&lt;/strong&gt;&lt;br&gt;
The most tedious part of delegation is verification. Antigravity solves this by moving away from raw logs to Artifacts - tangible deliverables like task lists, implementation plans, and Walkthroughs.&lt;br&gt;
The agent proactively verifies the deployment by uploading a test file to GCS and running SQL queries in BigQuery to ensure the data was processed correctly. You can review these results in the Walkthrough artifact, which summarizes every change and result at a glance&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fucysu9oejq019k65wx0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fucysu9oejq019k65wx0r.png" alt=" " width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To verify the application really works you can test It creates a test artifact (test.txt) and wants to upload it to Google Cloud Storage bucket. Click on Accept to go ahead.&lt;br&gt;
If you want to run further tests on your own, you can try to upload a sample file to the Cloud Storage bucket:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;gcloud storage cp .txt gs: // doc-ingestion-{project-id}&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Extend the Application
&lt;/h2&gt;

&lt;p&gt;Add a Frontend: Generate a Streamlit or Flask web app to visualize BigQuery data.&lt;br&gt;
Integrate Real AI: Swap "simulated" logic for real Gemini-powered document classification and translation.&lt;br&gt;
Enhance Security: Move sensitive configurations to Secret Manager or implement Dead Letter Queues (DLQ) for robust error handling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguesl09mda96djzle54v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguesl09mda96djzle54v.png" alt=" " width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google Antigravity represents a shift toward a higher, task-oriented level of engineering. By combining an AI-powered editor with a dedicated agent workspace, it allows developers to focus on the "what" while the agent handles the "how," turning abstract ideas into live, verified cloud applications in minutes.&lt;br&gt;
For your step to step learning visit the &lt;a href="https://codelabs.developers.google.com/build-and-deploy-gcp-with-antigravity?hl=en&amp;amp;continue=https%3A%2F%2Fcodelabs.developers.google.com%2Fcloudaittt2026" rel="noopener noreferrer"&gt;Build and Deploy to Google Cloud with Antigravity&lt;/a&gt; codelab for more.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>agentic</category>
      <category>gcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Empowering Autonomous AI Agents through Dynamic Tool Creation</title>
      <dc:creator>Tanaike</dc:creator>
      <pubDate>Fri, 24 Apr 2026 07:36:01 +0000</pubDate>
      <link>https://forem.com/gde/empowering-autonomous-ai-agents-through-dynamic-tool-creation-3pfm</link>
      <guid>https://forem.com/gde/empowering-autonomous-ai-agents-through-dynamic-tool-creation-3pfm</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecwfzv59swalkgbrjjs9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecwfzv59swalkgbrjjs9.jpg" alt="Infographic" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;Welcome to the Agentic Enterprise era. This article explores a paradigm shift in generative AI workflows by introducing an autonomous agent capable of dynamically creating, testing, and executing original tools. Utilizing Google Apps Script, Node.js emulation, and multi-agent orchestration, this architecture overcomes traditional limitations, enabling highly adaptable task execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;At Google Cloud Next 2026, a clear mandate was delivered: the era of generative AI as a passive assistant is over. We have entered the age of the Agentic Enterprise, where AI has transitioned from a software tool you deploy into an autonomous coworker you onboard. This identity shift is staggering. With models processing over 16 billion tokens per minute via direct API use, we are witnessing an industrial-scale migration toward autonomous workflows. In this new ecosystem, Workspace Intelligence eliminates tab-hopping, allowing users to seamlessly query across Google Drive, Gmail, and third-party platforms. Data has transformed from a reactive archive into a "System of Action," driven by Deep Research Agents that bridge structured and unstructured data to prevent hallucinations.&lt;/p&gt;

&lt;p&gt;However, this massive scale introduces new vulnerabilities and operational bottlenecks. As organizations adopt these highly autonomous coworkers to execute a wide variety of tasks, a vast multitude of tools is required. Yet, as the number of available tools in a Model Context Protocol (MCP) server increases, Large Language Models (LLMs) face a critical issue: inference accuracy and tool selection reliability significantly degrade. Researchers refer to this phenomenon as "Tool Space Interference" (TSI). &lt;a href="https://www.microsoft.com/en-us/research/blog/tool-space-interference-in-the-mcp-era-designing-for-agent-compatibility-at-scale/" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; When an LLM's context window becomes saturated with excessive tool definitions, semantic overlaps and irrelevant metadata hinder decision-making. Current technical guidelines recommend a "soft limit" of approximately 20 functions to maintain high accuracy. Exceeding this threshold often leads to increased hallucinations and failures in executing complex instructions.&lt;/p&gt;

&lt;p&gt;To address this, I previously published the article "Nexus-MCP: A Unified Gateway for Scalable and Deterministic MCP Server Aggregation." &lt;a href="https://medium.com/google-cloud/nexus-mcp-a-unified-gateway-for-scalable-and-deterministic-mcp-server-aggregation-3211f0adc603" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; In that piece, I introduced Nexus-MCP, a concept resolving Tool Space Interference by aggregating multiple MCP servers into a single deterministic gateway. Yet, even with this optimized architecture, a hard limit is eventually reached when attempting to execute an increasingly diverse array of edge-case tasks.&lt;/p&gt;

&lt;p&gt;Furthermore, relying strictly on static tools presents a severe security risk. The 2026 summit highlighted the emergence of machine-speed, "Living off the Land" AI attacks, where internal AI CLI tools are hijacked by rogue processes. Treating AI simply as software leaves systems vulnerable; they must be managed as identities with strict minimum privileges, cryptographic IDs, and robust "Kill Switches" to prevent reasoning drift from turning agents into digital predators.&lt;/p&gt;

&lt;p&gt;This article tackles both the TSI limitation and the security imperative directly by building an agent that dynamically creates, safely tests, and executes original scripts to process varied tasks on the fly. When an AI encounters an edge-case task lacking a pre-existing tool, a true paradigm shift occurs if the AI can automatically generate the exact tools it needs in real-time. By dynamically generating code and executing it in a secure, least-privilege sandbox, we protect the enterprise from unverified execution paths while expanding the spectrum of successfully executable tasks exponentially. I have previously explored this groundbreaking concept in the following articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/google-cloud/dynamic-tool-creation-for-google-workspace-automation-with-gemini-cli-f9618166aaed" rel="noopener noreferrer"&gt;Dynamic Tool Creation for Google Workspace Automation with Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/google-cloud/a-new-era-for-google-apps-script-unlocking-the-future-of-google-workspace-automation-with-natural-a9cecf87b4c6" rel="noopener noreferrer"&gt;A New Era for Google Apps Script: Unlocking the Future of Google Workspace Automation with Natural Language&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those articles, I demonstrated how tools could be dynamically created, safely executed in a sandbox, and thoroughly evaluated. &lt;a href="https://github.com/brucemcpherson/gas-fakes" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; The execution results were retrieved using &lt;code&gt;gas-fakes&lt;/code&gt;, a powerful emulation layer that allows Google Apps Script (GAS) projects to run natively on Node.js, effectively acting as an MCP server. We have advanced &lt;code&gt;gas-fakes&lt;/code&gt; into the robust &lt;code&gt;gas-fakes CLI&lt;/code&gt;. Simultaneously, deploying specialized AI agents has become an industry standard. For instance, the Gemini CLI can now seamlessly orchestrate these AI agents as subagents to handle complex, multi-step workflows.&lt;/p&gt;

&lt;p&gt;Recently, I also published a comprehensive guide, "Orchestrating Agents via ADK for TypeScript and Gemini CLI."&lt;a href="https://medium.com/google-cloud/orchestrating-agents-via-adk-for-typescript-and-gemini-cli-8629a86b0500" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; I detailed practical scaffolding patterns, sophisticated multi-agent coordination strategies, and seamless integration techniques for deploying remote subagents within the Gemini CLI ecosystem. The architectural concepts established in that article serve as the foundational blueprint for developing the advanced agent system discussed here.&lt;/p&gt;

&lt;p&gt;Specifically, I have engineered a robust AI agent equipped with a multi-agent framework designed to handle the entire lifecycle of a tool: writing the code, executing the script, validating the output, and summarizing the final results. I rigorously tested this framework as a subagent orchestrated by the Gemini CLI. For the tool's programming language, I selected Google Apps Script. While generative AI is often equipped with code interpreters capable of executing Python, I chose Google Apps Script because it is a low-code language featuring native integration with Google Workspace APIs, enabling seamless cloud automation. &lt;a href="https://workspace.google.com/intl/en/products/apps-script?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; For the execution platform, I utilized &lt;code&gt;gas-fakes&lt;/code&gt; to ensure rapid, secure, and local runtime capabilities. &lt;a href="https://github.com/brucemcpherson/gas-fakes" rel="noopener noreferrer"&gt;Ref&lt;/a&gt; Looking ahead, I speculate that AI will eventually generate and execute tasks using entirely novel programming languages—languages highly optimized for AI comprehension, even if difficult for humans to read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technological Foundations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Google Apps Script (GAS): GAS is a highly versatile, low-code platform that makes it quick and easy to build business solutions that integrate, automate, and extend Google Workspace. By bridging services like Gmail, Drive, Docs, and Sheets, business users and developers alike can build highly customized workflows—no professional development experience required. GAS is available to everyone with a standard Gmail account or Enterprise Workspace tenant, making it a ubiquitous tool for cloud automation. &lt;a href="https://workspace.google.com/intl/en/products/apps-script?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Ref&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/brucemcpherson/gas-fakes" rel="noopener noreferrer"&gt;gas-fakes&lt;/a&gt;: A powerful emulation layer that lets you run Apps Script projects on Node.js as if they were native. By translating proprietary GAS service calls into granular, authenticated Google API requests, it provides a secure, high-speed sandbox for local debugging, automated testing, and CI/CD pipeline integration. This eliminates the traditional latency of cloud-based GAS deployments and opens the door for AI agents to write, test, and execute Apps Script entirely within a localized Node environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Setup and Prerequisites
&lt;/h2&gt;

&lt;p&gt;You can view the complete repository of sample scripts at &lt;a href="https://github.com/tanaikech/autonomous-google-workspace-agent" rel="noopener noreferrer"&gt;https://github.com/tanaikech/autonomous-google-workspace-agent&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To follow along with this guide, ensure your environment meets the following requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js is installed and configured on your system.&lt;/li&gt;
&lt;li&gt;Gemini CLI is installed and accessible via your terminal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Install &lt;code&gt;autonomous-google-workspace-agent&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;To retrieve and initialize the scripts, execute the following commands in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tanaikech/autonomous-google-workspace-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;autonomous-google-workspace-agent
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To use this agent, you must configure your Gemini API key as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_API_KEY_HERE&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install gas-fakes
&lt;/h3&gt;

&lt;p&gt;Detailed installation instructions can also be found at&lt;a href="https://github.com/brucemcpherson/gas-fakes/blob/main/gas-fakes-cli.md" rel="noopener noreferrer"&gt;https://github.com/brucemcpherson/gas-fakes/blob/main/gas-fakes-cli.md&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @mcpher/gas-fakes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using a consumer account (standard Gmail), use the Application Default Credentials (ADC) authorization. Run the following command in the &lt;code&gt;autonomous-google-workspace-agent&lt;/code&gt; directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gas-fakes init &lt;span class="nt"&gt;--auth-type&lt;/span&gt; adc
gas-fakes auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the &lt;code&gt;.env&lt;/code&gt; file created by &lt;code&gt;gas-fakes init --auth-type adc&lt;/code&gt;. Required scopes might need to be manually added to &lt;code&gt;EXTRA_SCOPES&lt;/code&gt; if you encounter permission errors.&lt;/p&gt;

&lt;p&gt;If you are using a Google Workspace enterprise account, you can use either ADC or Domain Wide Delegation (DWD). The DWD type is the default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gas-fakes init &lt;span class="nt"&gt;--auth-type&lt;/span&gt; adc
gas-fakes auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or simply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gas-fakes init
gas-fakes auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install clasp
&lt;/h3&gt;

&lt;p&gt;If you do not need to upload the generated Google Apps Script to Google Drive, you can skip this step. Official documentation is available at &lt;a href="https://github.com/google/CLASP" rel="noopener noreferrer"&gt;https://github.com/google/CLASP&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/clasp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;clasp login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Multi-Agent Architecture
&lt;/h2&gt;

&lt;p&gt;You can review the complete architecture in &lt;a href="https://github.com/tanaikech/autonomous-google-workspace-agent" rel="noopener noreferrer"&gt;my repository&lt;/a&gt;. This section details the agent structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;src/agent.ts&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This script establishes a multi-agent system consisting of 5 specialized subagents and 1 master orchestrator.&lt;/p&gt;

&lt;p&gt;The comprehensive agent instructions are defined as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;ENVIRONMENT_CHECKER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`You are 'environment_checker'. Your objective is to verify if '@google/clasp' and '@mcpher/gas-fakes' are installed globally.
Use the 'check_cli_installation' tool to perform this check.
Report clearly to the orchestrator whether both, one, or none of them are installed.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;SCRIPT_WRITER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`You are 'script_writer', an expert Google Apps Script (GAS) developer.
Your primary objective is to write, debug, and refine Google Apps Script code to ensure it executes successfully within a local testing environment using the 'gas-fakes' library.

### Tool Usage Guidelines
1. Workspace Developer MCP:
   - Use this tool to reference the latest API specifications and documentation directly from Google Workspace. Ensure you are using the correct services, methods, and object structures.

2. Google Search (GOOGLE_SEARCH):
   - Search Priority: When looking for sample scripts or implementations, prioritize searching on Stack Overflow first (e.g., by appending "site:stackoverflow.com" to your query).
   - Broad Search: If sufficient information isn't found, perform broader searches (official tutorials, blogs, forums).
   - Troubleshooting: If the script fails, use Google Search to investigate specific error messages or understand specific behaviors of the 'gas-fakes' environment.

### Code Generation &amp;amp; Output Rules
1. Executable Code Block: Your output MUST strictly include the complete, runnable TypeScript or JavaScript code enclosed in a standard markdown code block.
2. 'gas-fakes' Context: Write your code assuming it is executed in the 'gas-fakes' environment. Keep in mind that certain advanced features might have mocked limitations.
3. Entry Point Invocation: Since the code will be executed as a direct string, you MUST ensure that the main entry function is explicitly called at the very bottom of your script (e.g., &lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;function main() { /* logic */ } main();&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;).

### Error Handling &amp;amp; Iteration
- When reported a failure from the previous step, carefully analyze the provided stderr/stdout execution logs.
- Identify the exact root cause (syntax error, API payload issue, missing permissions, etc.).
- Always provide the fully corrected and executable code block in your response.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;SCRIPT_EXECUTOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="na"&gt;schemaStr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`You are 'script_executor', an expert testing agent responsible for verifying Google Apps Script (GAS) code.
Your objective is to execute the provided script locally using the 'run_gas_in_sandbox' tool and report the exact results to the 'script_writer'.

### Tool Usage Guidelines ('run_gas_in_sandbox')
1. Script Execution Target:
   - Provide the direct GAS code string in the 'script' argument.
   - IMPORTANT: Ensure the entry function is called at the end of the string.

2. Sandbox Configurations ('useSandbox' and 'json'):
   - Read the user's prompt carefully. If the prompt explicitly states NOT to use the sandbox, set 'useSandbox' to false. Otherwise, you MUST set 'useSandbox' to true (default behavior).
   - If 'useSandbox' is true, you MUST pass a JSON configuration string via the 'json' argument to define the sandbox permissions.
   - Construct the 'json' argument strictly according to this JSON schema:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;schemaStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
   - Include all necessary method names in 'whitelistServices' to avoid permission errors.
   - If 'useSandbox' is false, the 'json' argument is not required and can be left empty.

### Evaluation &amp;amp; Output Rules
1. Execution Succeeded: If the tool returns successfully, return 'SUCCESS' along with the complete stdout execution logs.
2. Execution Failed: If the tool indicates an error, return 'FAILED' along with the exact stderr/stdout output.
3. Security Notice: You MUST explicitly state in your output whether the Google Apps Script was executed "WITH the sandbox" or "WITHOUT the sandbox".`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;SCRIPT_UPLOADER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`You are 'script_uploader', an expert at managing Google Apps Script projects on Google Drive using the 'clasp' CLI via MCP.
Your primary objective is to upload (push), download (pull), or create (create) GAS projects directly on Google Drive.

### Usage Guidelines &amp;amp; Strict File Operation Rules:
- The orchestrator will invoke you only if '@google/clasp' is confirmed to be installed.
- **Uploading a Script (MANDATORY WORKFLOW)**:
  When uploading a file using clasp, you MUST follow these precise steps:
  1. Create the project via clasp (if a new directory is needed).
  2. **Save the Script**: Use the 'save_script_file' tool to save the generated script as a file (e.g., .js or .gs) inside the directory created by clasp.
  3. **Execute Push**: Only AFTER the file has been successfully saved, execute the clasp push command to upload it.
- Ensure to handle authentication or missing project contexts appropriately.
- Report the detailed outcome of your file creation and clasp operations.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;SUMMARY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Summarize the final deliverables in the following format:
1. Execution Summary (Whether it succeeded, was skipped, and what processes were executed).
2. Final Script Code (Clean code block, ready to be copied and used).
3. Execution Results / Data. You MUST explicitly mention whether the script was executed WITH a sandbox or WITHOUT a sandbox.
4. System Guidance (Include any specific guidance required from the orchestrator regarding missing CLIs or Drive sync capabilities).`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;ORCHESTRATOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`You are a Senior Multi-Agent Orchestrator and the leader coordinating multiple sub-agents. Your primary role is to deeply understand the given prompt, select the optimal sub-agents, and execute them in the optimal order to autonomously develop, test, and manage Google Apps Script (GAS) solutions, ensuring the prompt's tasks are accomplished reliably.

### Handling Missing Information (Crucial Requirement)
If any information required to achieve the task in the prompt is missing (e.g., specific requirements, Google Drive Folder IDs, target service names), you MUST provide feedback to the user requesting the necessary details. Once the user provides the missing information, you must resume the workflow and aim to achieve the prompt's task based on the added context.

### Available Sub-Agents &amp;amp; Expertise:
- "environment_checker" (agent0): Checks if '@google/clasp' and '@mcpher/gas-fakes' are installed.
- "script_writer" (agent1): References Google Workspace API docs and writes gas-fakes compatible code.
- "script_executor" (agent2): Simulates script execution in the gas-fakes environment (with or without a sandbox).
- "script_uploader" (agent3): Manages GAS projects via clasp and handles file saves prior to upload.
- "summary_agent" (agent4): Formats the final deliverables into a structured report.

### Operational Protocols:
1. **Selection &amp;amp; Purpose**: Clearly identify which agent(s) you are using and why. You must determine the optimal sequence of execution based on the task's complexity.
2. **Execution Strategy**:
   - **Environment Check (Mandatory First Step)**:
     - BEFORE invoking 'script_executor' or 'script_uploader', you MUST use 'environment_checker' to verify installations.
     - **If '@mcpher/gas-fakes' is NOT installed**: Skip execution, return ONLY the generated script, and instruct the user to install it via &lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;npm -g install @mcpher/gas-fakes&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;.
     - **If '@google/clasp' IS installed**: Explicitly state that the user can upload, download, or create scripts directly on Google Drive.
     - **Clasp Independence**: Inform the user that creating and executing GAS locally is still possible as long as 'gas-fakes' is installed.

   - **Direct Execution (If provided by the user)**:
     - If the user provides Google Apps Script code directly in their prompt and asks to execute it, you can bypass the 'script_writer' and pass the provided script directly to the 'script_executor'.

   - **Iterative Workflow (If gas-fakes is installed)**:
     1. Ask 'script_writer' to generate code.
     2. Pass the code to 'script_executor' for simulation.
     3. If 'FAILED', pass the details back to 'script_writer' for regeneration.
     4. **Constraint**: The cycle has a MAXIMUM limit of 5 retries.

   - **Script Management (Optional, if clasp is installed)**:
     - Use 'script_uploader' if project creation/upload is requested. Ensure you communicate that files will be generated before pushing.

   - **Serial (Finalization)**:
     Once execution succeeds or limits are reached, invoke 'summary_agent' to generate the final guaranteed output.

3. **Reporting (Strict Requirement)**: You MUST start your response with an "Execution Log".

### Mandatory Output Format (in English):
---
## Execution Log
- **Agents Involved**:[List names of agents used]
- **Execution Strategy**:[Iterative / Serial / Direct Execution / Awaiting User Input]
- **Purpose &amp;amp; Logic**:[Briefly explain the coordination, environment check results, retry cycles, or reason for requesting missing information]

## Result[Provide the comprehensive final answer in the requested language, incorporating the output from summary_agent, and the necessary feedback about missing CLIs, Drive capabilities, or missing information required from the user]`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;environment_checker&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;environmentCheckerAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;environment_checker&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Checks if the required CLI tools (@google/clasp and @mcpher/gas-fakes) are installed globally.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ENVIRONMENT_CHECKER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;checkCliInstallationTool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This agent verifies whether the necessary dependencies are properly installed in the host environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;script_writer&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scriptWriterAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script_writer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;References Google Workspace API specifications via MCP and generates code for the gas-fakes environment. Analyzes and fixes errors.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SCRIPT_WRITER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MCPToolset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;StreamableHTTPConnectionParams&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://workspace-developer.goog/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;GOOGLE_SEARCH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;generateContentConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;toolConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;includeServerSideToolInvocations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tools are dynamically generated using Google Apps Script, ensuring high affinity with Google Workspace. To enable &lt;code&gt;GOOGLE_SEARCH&lt;/code&gt; alongside custom capabilities, &lt;code&gt;includeServerSideToolInvocations: true&lt;/code&gt; must be specified.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;script_executor&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scriptExecutorAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script_executor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Simulates script execution securely in the gas-fakes sandbox environment and handles error reporting.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SCRIPT_EXECUTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;SANDBOX_PERMISSION_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;runGasInSandboxTool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The created Google Apps Script is run using gas-fakes within a strict sandbox. This protects the wider enterprise environment from unauthorized access. &lt;code&gt;gas-fakes&lt;/code&gt; is run by the gas-fakes CLI. You can also select with and without the sandbox for executing Google Apps Script. In some cases, the sandbox might not be required to be used.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;script_uploader&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scriptUploaderAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;script_uploader&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Uploads, downloads, or creates Google Apps Script projects on Google Drive using clasp. Used only when @google/clasp is installed.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SCRIPT_UPLOADER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MCPToolset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;StdioConnectionParams&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;serverParams&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;clasp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="nx"&gt;saveScriptFileTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If clasp is installed, this agent pushes the validated scripts directly to Google Drive. However, even without clasp, the generated tools can still be executed locally.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;summary_agent&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summaryAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary_agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Formats the final deliverables into a structured report.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SUMMARY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This final agent synthesizes the execution logs, source code, and outcome data into a clean, readable report for the user.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;autonomous_gas_orchestrator&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;autonomousGoogleWorkspaceAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;autonomous-google-workspace-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Senior Orchestrator managing GAS creation, environment check, execution, clasp integration, and up to 5 retries.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AGENT_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ORCHESTRATOR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;subAgents&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;
    &lt;span class="nx"&gt;environmentCheckerAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;scriptWriterAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;scriptExecutorAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;scriptUploaderAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;summaryAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the central agent that intelligently delegates tasks and manages the lifecycle of the entire dynamic generation process.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;src/a2aserver.ts&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This script is used for launching &lt;code&gt;autonomous-google-workspace-agent&lt;/code&gt; as an A2A server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;
&lt;span class="cm"&gt;/**
 * A2A server
 */&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;toA2a&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@google/adk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;autonomousGasOrchestratorAgent&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;targetAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./agent.ts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;startServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// For A2A&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;toA2a&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;targetAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;basePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Server started on http://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Try: http://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/.well-known/agent-card.json`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;startServer&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Launching the Agent System
&lt;/h2&gt;

&lt;p&gt;This framework can function both as a standalone web server and as a subagent linked to the Gemini CLI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Launch the Web server:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npm run web

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; adk-full-samples@1.0.0 web
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; npx adk web src/agent.ts


+-----------------------------------------------------------------------------+
| ADK API Server started                                                      |
|                                                                             |
| For &lt;span class="nb"&gt;local &lt;/span&gt;testing, access at http://localhost:8000.                         |
+-----------------------------------------------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now interact with the web interface by navigating to &lt;code&gt;http://localhost:8000&lt;/code&gt; in your browser.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Launch the A2A server:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run a2a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon execution, you will see a confirmation output in your terminal indicating the server has started successfully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ npm run a2a

&amp;gt; adk-full-samples@1.0.0 a2a
&amp;gt; npx tsx src/a2aserver.ts

Server started on http://localhost:8000
Try: http://localhost:8000/.well-known/agent-card.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To configure this A2A server as a subagent for the Gemini CLI, create or update &lt;code&gt;.gemini/agents/autonomous-google-workspace-agent.md&lt;/code&gt; with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---
kind: remote
name: autonomous-google-workspace-agent
agent_card_url: http://localhost:8000/.well-known/agent-card.json
---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can inspect the agent card specifications by opening the provided URL (&lt;code&gt;http://localhost:8000/.well-known/agent-card.json&lt;/code&gt;) in your browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing and Use Cases
&lt;/h2&gt;

&lt;p&gt;Once correctly configured, launch the Gemini CLI. You can now delegate complex natural language instructions to your &lt;code&gt;autonomous-google-workspace-agent&lt;/code&gt;. The system will automatically construct necessary scripts, execute them, iterate on errors, and return validated results.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/7Ki-nA1Z1c0"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Dynamic Exchange Rate Retrieval via GOOGLEFINANCE
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a new Google Spreadsheet by putting a formula `=GOOGLEFINANCE("CURRENCY:USDJPY")` in cell "A1" of the first sheet. Then, get and show the value of cell "A1". (Note: `gas-fakes` has no `getActiveSheet()` method. In this case, use `getSheets()[0]`.)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system successfully generated the script, ran it safely within the sandbox, and fetched the financial data directly from Google Sheets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9mxjivm538i4s5mme4a0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9mxjivm538i4s5mme4a0.jpg" alt="fig2a" width="800" height="623"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Automated Spreadsheet Initialization with Dummy Data
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In this case, the script is not required to be executed in a sandbox. Create a new Google Spreadsheet and add the headers 'Date', 'Task Name', and 'Status' to cells A1:C1. Then, populate the next 3 rows with dummy task data. Finally, execute the script, retrieve the URL of the created spreadsheet, and provide it to me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following the execution, the agent instantiated the Google Sheet and correctly populated it with the requested structured dummy data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpifn3rvnvnbn156ba20u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpifn3rvnvnbn156ba20u.jpg" alt="fig2b" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Dynamic Calendar Event Scheduling with Autonomous Error Correction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**Write** and **execute** a script to create a Google Calendar event titled 'Monthly Team Meeting' for exactly one hour starting at 10:00 AM on the second Monday of next month. If you encounter any API specification errors or logic issues with the date calculation, investigate the cause, fix the code, and retry until the execution is successful. The calendar ID is '{your calendar ID}'.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator dynamically calculated the date logic, interfaced with the Calendar API, handled temporary validation failures gracefully via the &lt;code&gt;script_writer&lt;/code&gt;, and successfully scheduled the event.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. GAS Development Workflow: Document Highlighter and Google Drive Deployment
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a GAS custom function that highlights a specific keyword (e.g., 'TODO') in yellow within a Google Document, and verify its execution in the local sandbox. Once successful, create a new GAS project named 'Doc Highlighter Project' on Google Drive, save the file locally, and upload (push) it using clasp.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script successfully manipulated the Google Document and utilized the &lt;code&gt;script_uploader&lt;/code&gt; agent to push the project securely to Google Drive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx126nx4xa3nl9xm1yomt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx126nx4xa3nl9xm1yomt.jpg" alt="fig2c" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Automated Weekly Drive Report: Filtered File Aggregation and Gmail Synchronization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In this case, the script is not required to be executed in a sandbox. Retrieve a list of files in a folder of folder ID '{folder ID}' of my Google Drive that were created or modified in the last week. Then, create a draft email in Gmail containing a list of these file names and their URLs as a weekly report. Set the recipient to 'tanaike@hotmail.com' and the subject to 'Weekly Drive Files Report'.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result was an accurately compiled draft email directly saved to Gmail, complete with proper formatting and links.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Weekly Drive Files Report

The following files were created or modified in the last 7 days:

- Name: sample slide 1
  URL: https://docs.google.com/presentation/d/{file ID 1}/edit?usp=drivesdk

- Name: sample spreadsheet 1
  URL: https://docs.google.com/spreadsheets/d/{file ID 2}/edit?usp=drivesdk

- Name: sample document 1
  URL: https://docs.google.com/document/d/{file ID 3}/edit?usp=drivesdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deployment Considerations
&lt;/h2&gt;

&lt;p&gt;While this article demonstrates running the A2A server locally for testing, deploying this multi-agent architecture to fully managed serverless environments—such as Google Cloud Run—will vastly increase its operational capacity. Adopting a cloud-native hosting strategy ensures the A2A server can automatically scale to accommodate high-concurrency enterprise workflows without hardware bottlenecks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The transition to the Agentic Enterprise means AI models now act as autonomous coworkers, requiring a shift from static software deployments to dynamic identity management and task execution.&lt;/li&gt;
&lt;li&gt;Tool Space Interference (TSI) degrades inference accuracy when LLMs are overloaded with predefined tools, a limitation effectively bypassed by enabling agents to dynamically write and execute their own scripts in real-time.&lt;/li&gt;
&lt;li&gt;Utilizing Google Apps Script alongside the gas-fakes emulation layer provides AI agents with a secure, high-speed, and sandboxed local runtime to safely develop and test enterprise workflows.&lt;/li&gt;
&lt;li&gt;A robust multi-agent architecture seamlessly orchestrated via the Gemini CLI can independently handle the entire tool lifecycle: code generation, secure sandbox testing, execution validation, and Google Drive deployment.&lt;/li&gt;
&lt;li&gt;Expanding this framework into serverless cloud environments not only addresses the massive computational scale of modern AI but also fortifies security by isolating dynamic tool execution from core organizational infrastructure.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>agents</category>
      <category>googleappsscript</category>
    </item>
    <item>
      <title>A Stitch at a time</title>
      <dc:creator>Gbemisola Esho</dc:creator>
      <pubDate>Fri, 24 Apr 2026 06:52:18 +0000</pubDate>
      <link>https://forem.com/gde/a-stitch-at-a-time-j20</link>
      <guid>https://forem.com/gde/a-stitch-at-a-time-j20</guid>
      <description>&lt;p&gt;Every UI idea starts the same way. A sentence. A sketch. A screenshot of something you saw and liked.&lt;br&gt;
Then the process starts and the idea starts dying.&lt;br&gt;
Wireframes go to designers. Designs go to developers. Somewhere in that handoff, the original intent gets trimmed, approximated, and eventually shipped as something slightly less than what you imagined.&lt;br&gt;
Google Stitch is built to kill that gap.&lt;br&gt;
Launched at Google I/O 2025 under Google Labs, Stitch takes a plain English prompt  or a rough sketch, or a URL and returns a high-fidelity UI design with exportable frontend code. The whole thing takes under two minutes. No Figma file. No design-to-developer handoff. No lost intent.&lt;br&gt;
This is what that workflow looks like, what it gets right, and where it still has limits.&lt;br&gt;
"Stitch does one thing exceptionally well: it gets you out of the blank canvas problem fast."&lt;/p&gt;

&lt;h2&gt;
  
  
  What Stitch actually generates
&lt;/h2&gt;

&lt;p&gt;Most AI design tools give you a starting point. A rough layout. Something to react to.&lt;br&gt;
Stitch gives you more than that but less than you might expect. Understanding exactly what comes out of it is the difference between using it well and being disappointed by it.&lt;br&gt;
Here is what a single Stitch prompt produces.&lt;/p&gt;

&lt;p&gt;Before you open Stitch  &lt;/p&gt;

&lt;h2&gt;
  
  
  Write a better prompt
&lt;/h2&gt;

&lt;p&gt;Stitch is only as good as what you give it. A weak prompt returns a generic layout. A specific one returns something you can actually build on. Spend three minutes here before you spend ninety seconds waiting for a generation.&lt;br&gt;
&lt;strong&gt;A strong Stitch prompt answers four questions&lt;/strong&gt;:&lt;br&gt;
What kind of product is this? Name the category  dashboard, mobile app, landing page, onboarding flow, news article template. Stitch needs to know what structural pattern to reach for.&lt;/p&gt;

&lt;p&gt;Who is it for? An internal tool looks different from a consumer product. A newsroom CMS looks different from a public-facing news app. Say it explicitly.&lt;br&gt;
What are the key screens or sections? Do not make Stitch guess. List the components you need — sidebar, data table, search bar, article card, navigation menu.&lt;/p&gt;

&lt;p&gt;What is the visual tone? Minimal, bold, dark mode, data-heavy, editorial, clean. One or two words is enough.&lt;br&gt;
Here is the difference in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weak prompt&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;"Design a news app."&lt;/p&gt;

&lt;p&gt;Strong prompt:&lt;/p&gt;

&lt;p&gt;"Design a mobile news reader app for a digital-first African newsroom. Dark theme. Home screen with a hero story, category tabs, and a scrollable article feed. Article view with a full-width image, headline, author byline, share button, and related stories section. Clean, editorial feel — inspired by BBC News and The Guardian app."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same tool. Same model. Completely different output&lt;/strong&gt;.&lt;br&gt;
Write your prompt before you open the browser tab. Treat it like a brief, not a search query.&lt;/p&gt;

&lt;p&gt;Now you are ready. Here is exactly what to do.&lt;br&gt;
Step 1: Go to stitch.withgoogle.com&lt;br&gt;
You need a Google account. No waitlist, no setup. Sign in and you land directly on the prompt canvas.&lt;br&gt;
Step 2: Choose your mode&lt;br&gt;
You will see two options before you type anything.&lt;br&gt;
Standard mode uses Gemini 2.5 Flash. Use this for exploration — testing ideas, comparing layouts, moving fast. You get 350 generations per month.&lt;br&gt;
Experimental mode uses Gemini 2.5 Pro. Use this when you have found your direction and want a higher-quality result worth exporting. You get 50 generations per month.&lt;br&gt;
Start with Standard. Switch to Experimental only when you have a prompt you are confident in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pbcbay8vhvr2e5c5w9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pbcbay8vhvr2e5c5w9t.png" alt=" " width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 3: Paste your prompt and generate&lt;br&gt;
Stitch will show you a generation time estimate usually around 90 seconds for Standard mode. It is accurate. Do not refresh.&lt;br&gt;
When it comes back, you will see two or three layout variants side by side.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y33vkpq0idmmbzc9ra7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y33vkpq0idmmbzc9ra7.png" alt=" " width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 4: Compare variants do not skip this&lt;br&gt;
Resist the urge to click the first one that looks good. Spend 60 seconds comparing structural decisions. Does one use a bottom nav bar while another uses a sidebar? Does one prioritise the hero image while another leads with text? These are architectural choices that matter later.&lt;br&gt;
Pick the variant whose structure fits your product logic not just the one with the nicest colours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g6hosmwvmb6yu6misrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g6hosmwvmb6yu6misrh.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsoj89li91tz8xeq6h6i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsoj89li91tz8xeq6h6i.png" alt=" " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 5: Refine with follow-up prompts&lt;br&gt;
Your first generation is a starting point. The chat panel on the right accepts follow-up instructions. Be specific.&lt;br&gt;
Instead of: "Make it look better"&lt;br&gt;
Write: "Switch the background to dark grey. Move the category tabs below the hero image. Make the headline font larger and bolder. Apply WCAG 2.1 contrast standards throughout."&lt;br&gt;
Each follow-up takes another 60–90 seconds. Give it two or three rounds before moving to export.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p7vf5xvbfgo2e5zccpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p7vf5xvbfgo2e5zccpp.png" alt=" " width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Section 4: Exporting your design which path to take&lt;br&gt;
When your design is ready, you have three exits. Pick the one that matches your next step.&lt;br&gt;
Export to Figma  if you have a design team&lt;br&gt;
Click "Copy to Figma" and paste directly into an open Figma file. Your design arrives as editable components. From there your team can apply your real design system, adjust spacing, and hand off through the normal design workflow.&lt;br&gt;
Use this path when: you are handing the design to someone else, you have an existing component library, or the project needs collaborative review before development.&lt;br&gt;
Export to Google AI Studio : if you want working code fast&lt;br&gt;
This is the most powerful path for solo builders. Send your design to AI Studio and Gemini converts it into a functional web application — with routing, interactivity, and logic — without you writing a line of code.&lt;br&gt;
Exporting to Google AI Studio&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8vha4uokog65i28ljk2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8vha4uokog65i28ljk2.png" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;br&gt;
Use this path when: you are a founder, developer, or journalist building a prototype you want to test quickly.&lt;br&gt;
Export as HTML and Tailwind CSS — if you are building it yourself&lt;br&gt;
Click "View Code" and download the frontend code directly. It is clean, readable, and structured logically. Drop it into your repository and build from it.&lt;br&gt;
The one limitation: you cannot change the output stack. Stitch always exports HTML and Tailwind CSS. If your project runs on React, SwiftUI, or a custom component library, treat this code as a visual blueprint rather than production-ready output.&lt;br&gt;
Use this path when: you are a web developer who wants a head start on a project and can translate or adapt the code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcr3jd24ny9hxo83t9hpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcr3jd24ny9hxo83t9hpy.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Section 5: Where Stitch fits and where it does not&lt;br&gt;
Stitch is not a Figma replacement. It does not have Figma's precision, collaboration features, or component library depth. If your team runs a mature design system, Stitch is an upstream ideation tool  not a replacement for your existing workflow.&lt;br&gt;
What Stitch is genuinely good at is getting from zero to something real, fast. The blank canvas problem that paralysing moment at the start of any design project largely disappears. You have a structured layout, a colour system, and exportable code in under two minutes. For journalists, founders, and solo developers who need to move fast without a design team, that is a meaningful shift.&lt;br&gt;
The honest limitations are three. Layout consistency drifts across screens components that look identical visually are not always built identically under the hood. Complex flows with more than two or three screens require significant follow-up prompting. And the generation caps are real 50 Experimental mode generations per month sounds generous until you are deep in a prototype and burning through revisions.&lt;br&gt;
Use Stitch for: early-stage ideation, rapid prototyping, client presentations, fellowship projects, hackathons, and any situation where speed matters more than perfection.&lt;br&gt;
Do not use Stitch for: production-ready design systems, large multi-screen applications, or any project where design consistency across dozens of components is non-negotiable.&lt;/p&gt;

&lt;p&gt;The tools that change how you work are rarely the ones that replace everything you know. They are the ones that remove the friction at the point where most ideas die.&lt;br&gt;
For UI design, that point has always been the start the blank canvas, the first decision, the gap between what you can imagine and what you can actually put on a screen.&lt;br&gt;
Stitch does not build your product for you. It just makes sure the idea survives long enough to become one.&lt;/p&gt;

</description>
      <category>ui</category>
      <category>design</category>
      <category>ai</category>
    </item>
    <item>
      <title>From Generative to Agentic: My Key Takeaways from Google Cloud Next ‘26</title>
      <dc:creator>Ibtissem Hattab</dc:creator>
      <pubDate>Thu, 23 Apr 2026 15:45:10 +0000</pubDate>
      <link>https://forem.com/gde/from-generative-to-agentic-my-key-takeaways-from-google-cloud-next-26-4574</link>
      <guid>https://forem.com/gde/from-generative-to-agentic-my-key-takeaways-from-google-cloud-next-26-4574</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro0e7s1i2od9glr7ws1w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro0e7s1i2od9glr7ws1w.jpg" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The era of “chatting with AI” has officially evolved into the era of “AI doing the work.” This year at Google Cloud Next ’26, the theme was unmistakable: &lt;strong&gt;The Agentic Enterprise&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As a DevOps Engineer, I didn’t just see new product announcements; I saw a fundamental shift in how we will design, deploy, and orchestrate cloud-native applications.&lt;/p&gt;

&lt;p&gt;For the MENAT tech community and beyond, these tools represent a massive leap in accessibility and power.&lt;/p&gt;

&lt;p&gt;Here is my technical breakdown of the most significant shifts announced at Next ‘26.&lt;/p&gt;

&lt;h2&gt;
  
  
  1.The Infrastructure Powering the Agentic Era
&lt;/h2&gt;

&lt;p&gt;For those of us managing heavy LLM workloads and heterogeneous clusters, the &lt;strong&gt;AI Hypercomputer&lt;/strong&gt; updates are the cornerstone. Google is vertically optimizing the stack from the silicon up to the orchestrator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8th Generation TPUs (TPU 8t &amp;amp; 8i):&lt;/strong&gt; The introduction of specialized chips for training (8t) and cost-effective, near-zero latency inference (8i) is a game-changer for platform engineering.&lt;br&gt;
&lt;strong&gt;Virgo Networking &amp;amp; Managed Lustre:&lt;/strong&gt; Scaling to hundreds of thousands of accelerators requires massive throughput. With 10 TB/s throughput now possible, the bottlenecks in distributed training are being dismantled.&lt;br&gt;
&lt;strong&gt;GKE &amp;amp; Agent Sandboxes:&lt;/strong&gt; For DevOps teams, the ability to deploy 300 secure sandboxes per second per cluster with sub-second “cold starts” is the level of responsiveness required for autonomous agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Gemini Enterprise: The Orchestration Layer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjwfzsazp85g41pai4pn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjwfzsazp85g41pai4pn.png" alt=" " width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The transition from Vertex AI to the &lt;strong&gt;Gemini Enterprise Agent Platform&lt;/strong&gt; simplifies the “Build, Scale, Govern, and Optimize” lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Studio &amp;amp; ADK:&lt;/strong&gt; The new graph-based framework for agent-to-agent orchestration allows for deterministic logic essential for compliance-heavy industries.&lt;br&gt;
&lt;strong&gt;Model Context Protocol (MCP):&lt;/strong&gt; This is perhaps the most exciting for developers. By exposing Google Cloud services as MCP servers, agents can now troubleshoot infrastructure using decades of Google’s own telemetry.&lt;br&gt;
&lt;strong&gt;Long-Running Agents:&lt;/strong&gt; We are moving away from temporary sessions toward agents with persistent &lt;strong&gt;Memory Banks&lt;/strong&gt; that can autonomously execute complex, multi-step business processes.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Solving Context Bloat with “Agent Skills”
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zmzltelf6x58u7dsind.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zmzltelf6x58u7dsind.png" alt=" " width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As models improve, we are increasingly using agentic AI to build with products like Firebase, BigQuery, and GKE. But how do we ensure the model has accurate, real-time info without causing “&lt;strong&gt;context bloat&lt;/strong&gt;”?&lt;/p&gt;

&lt;p&gt;Heavily using &lt;strong&gt;MCP&lt;/strong&gt; (Model Context Protocol) servers can sometimes rack up token costs and confuse the model by loading too much data. To solve this, Google announced &lt;strong&gt;Agent Skills&lt;/strong&gt;: a simple, open format for giving agents condensed expertise. Think of a skill as compact, agent-first documentation that loads &lt;strong&gt;only as needed&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On Day 1 of Next ’26, Google launched the official &lt;strong&gt;Agent Skills repository&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/google/skills" rel="noopener noreferrer"&gt;👉 https://github.com/google/skills&lt;br&gt;
&lt;/a&gt;&lt;br&gt;
Starting with thirteen key skills:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product Depth:&lt;/strong&gt; AlloyDB, BigQuery, Cloud Run, Cloud SQL, Firebase, Gemini API, and GKE.&lt;br&gt;
&lt;strong&gt;The “Well-Architected” Pillars:&lt;/strong&gt; Security, Reliability, and Cost Optimization.&lt;br&gt;
&lt;strong&gt;Operational Recipes:&lt;/strong&gt; Onboarding, Authentication, and Network Observability.&lt;/p&gt;

&lt;p&gt;You can install these to your agents of choice (like Antigravity or the Gemini CLI) using:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;npx skills install github.com/google/skills&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The Agentic Data Cloud: Systems of Action
&lt;/h2&gt;

&lt;p&gt;We are moving from “Systems of Intelligence” (reactive archives) to “Systems of Action” (proactive agents).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-Cloud Lakehouse:&lt;/strong&gt; The standardization on Apache Iceberg and zero-copy access to AWS and Azure data means we can finally build a borderless foundation for AI without the friction of vendor lock-in.&lt;br&gt;
&lt;strong&gt;Knowledge Catalog:&lt;/strong&gt; This creates a dynamic context graph of an entire business, grounding agents in trusted semantics so they actually understand the data they are processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Agentic Defense: Security at AI Speed
&lt;/h2&gt;

&lt;p&gt;As we feed more proprietary data into these models, security cannot be an afterthought. The shift toward an “Agentic Enterprise” requires security that moves at the speed of the agents themselves. Google’s new &lt;strong&gt;Agentic Defense&lt;/strong&gt; framework integrates threat intelligence directly into the AI lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Threat Hunting &amp;amp; Detection Engineering Agents:&lt;/strong&gt; We are seeing the automation of manual security crafts. These agents can proactively hunt for novel attack patterns and generate persistent detection rules in moments rather than weeks, transforming the SOC (Security Operations Center).&lt;br&gt;
&lt;strong&gt;Dark Web Intelligence:&lt;/strong&gt; Utilizing the latest Gemini models, this system builds a nuanced profile of an organization to analyze millions of external events, identifying threats that specifically target an enterprise’s unique AI assets.&lt;br&gt;
&lt;strong&gt;Fraud Defense:&lt;/strong&gt; The evolution of reCAPTCHA into a comprehensive platform for distinguishing between bots, humans, and agents is a critical step in maintaining trust in digital commerce.&lt;/p&gt;

&lt;p&gt;For DevOps and Security teams, this means moving from a reactive “ticket-based” security model to a proactive, autonomous defense layer that lives within the same GKE clusters as our production workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Perspective: What This Means for Us
&lt;/h2&gt;

&lt;p&gt;Seeing nearly 75% of Google Cloud customers already using AI products is a testament to how fast this field is moving. We are no longer in the “experimental” phase; the Agentic Enterprise is officially in production at a global scale.&lt;/p&gt;

&lt;p&gt;For me, the most inspiring part is the &lt;strong&gt;democratizing power&lt;/strong&gt; of these tools. Whether it’s NASA using agents for flight readiness or a midsize business conversationally exploring data, the barrier to entry for high-tier technology is vanishing.&lt;/p&gt;

&lt;p&gt;As engineers, our role is shifting from building the plumbing to architecting the vision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The question is no longer:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“What can AI say?” But “What will your Agentic Enterprise build?”&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Connect
&lt;/h2&gt;

&lt;p&gt;What announcement from Next ’26 are you most excited to implement in your stack?&lt;/p&gt;

&lt;p&gt;Let’s discuss in the comments!&lt;/p&gt;

&lt;h1&gt;
  
  
  GoogleCloudNext #GenAI #DevOps #Kubernetes #AgenticEnterprise #GDE #CloudComputing
&lt;/h1&gt;

</description>
      <category>googlecloud</category>
      <category>googlecloudnext</category>
      <category>agentskills</category>
      <category>agentplatform</category>
    </item>
    <item>
      <title>Google Cloud’s Agent Ops Stack: Why Deployment Is No Longer the Hard Part</title>
      <dc:creator>Sonika Janagill</dc:creator>
      <pubDate>Wed, 22 Apr 2026 22:58:00 +0000</pubDate>
      <link>https://forem.com/gde/google-clouds-agent-ops-stack-why-deployment-is-no-longer-the-hard-part-g3k</link>
      <guid>https://forem.com/gde/google-clouds-agent-ops-stack-why-deployment-is-no-longer-the-hard-part-g3k</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-cloud-next-2026-04-22"&gt;Google Cloud NEXT Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8dabdky21iay6131wpi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8dabdky21iay6131wpi.png" alt="Google AgentOps" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Gemini Enterprise Agent Platform slide that opened &lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Google Cloud Next'26&lt;/a&gt; has four layers: &lt;strong&gt;Build, Scale, Govern, Optimise.&lt;/strong&gt; Look at what is missing: Deploy.&lt;/p&gt;

&lt;p&gt;That omission is not an oversight. It is the point. Deploy has not disappeared. In the platform's lifecycle it is handled as an automated background step via &lt;a href="https://developers.googleblog.com/agents-cli-in-agent-platform-create-to-production-in-one-cli?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent CLI&lt;/a&gt; and &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/runtime?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Runtime&lt;/a&gt;, part of Build and Scale. Google has made it a standardised process precisely so it stops being the primary engineering challenge. The hard questions are now upstream and downstream of it.&lt;/p&gt;

&lt;p&gt;A year ago, the conversation in every enterprise AI session was "how do we run an agent?" Today, Thomas Kurian opened the Next '26 keynote by declaring the agentic enterprise "real — and deployed at a scale the world has never before seen," and announcing a platform designed to answer an entirely different question: how do we govern a fleet of thousands of them?&lt;/p&gt;

&lt;p&gt;That shift, from deployment to governance, from experiment to operations, is what it actually means for agents to become first-class citizens on Google Cloud. It is a change in the platform's fundamental assumptions. We are leaving the era of the Request/Response cycle and entering the era of the Long-Lived Agentic Session. Infrastructure built for humans processing HTTP requests is being rebuilt for agents processing week-long workflows, with identity, memory, security, and observability treated as primitives rather than afterthoughts.&lt;/p&gt;

&lt;p&gt;Here is what that looks like in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Vertex AI to an agent operations platform
&lt;/h2&gt;

&lt;p&gt;The Gemini Enterprise Agent Platform is not a rebrand of Vertex AI. It is the evolution of it, and the distinction matters. Vertex AI gave engineers a trusted surface to build and tune models. The Agent Platform gives engineering teams a surface to manage agents as operational entities.&lt;/p&gt;

&lt;p&gt;For the past two years, the industry has been consumed by the Dev Stack for agents: which LLM to use, how to write the perfect prompt, which RAG framework to pick. Google's announcement effectively says: the Dev Stack is largely solved. Let's talk about the Ops Stack.&lt;/p&gt;

&lt;p&gt;The four pillars — Build, Scale, Govern, Optimise — are worth reading in the order Google chose, because that order tells you where the work is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh34dm1a25f3kj19af4xy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh34dm1a25f3kj19af4xy.jpg" alt="The four pillars of the Gemini Enterprise Agent Platform" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build&lt;/strong&gt; covers what most developers already expect: a graph-based Agent Development Kit (ADK) supporting Python, TypeScript, Java, and Go; a low-code &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/agent-studio/overview?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Studio&lt;/a&gt;; &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/agent-garden?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Agent Garden&lt;/a&gt; templates; and &lt;a href="https://docs.cloud.google.com/architecture/agentic-ai-bidirectional-multimodal-streaming?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;multimodal streaming&lt;/a&gt;. Google reports that over six trillion tokens are processed monthly through ADK alone. The model backbone for this platform is the Gemini 3 family: Gemini 3 Pro for complex workflow orchestration, Gemini 3 Flash for the high-frequency, lower-latency tasks that agent loops demand. The tooling here is mature. The interesting announcements are in the next three layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scale&lt;/strong&gt; is where the runtime gets serious. Agent Runtime now delivers sub-second cold starts. Long-running agents can maintain state for up to seven days. Agent Sandbox provides hardened execution environments for model-generated code and computer-use tasks. The key addition is &lt;strong&gt;Memory Bank with Memory Profiles&lt;/strong&gt;: agents can now retain long-term, high-accuracy context across sessions, mapped to internal CRM and database records via Custom Session IDs. Stateful agents are not an edge case anymore; they are the runtime's default assumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Govern&lt;/strong&gt; is the layer that signals the platform shift most clearly. Three new capabilities: &lt;strong&gt;Agent Identity&lt;/strong&gt;, &lt;strong&gt;Agent Registry&lt;/strong&gt;, and &lt;strong&gt;Agent Gateway&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of Agent Identity, Agent Registry, and Agent Gateway together as Active Directory for the AI era: the system that manages who your non-human workforce is, what it can access, and what it did. Agent Identity gives every agent a unique cryptographic ID with an auditable trail mapped to authorisation policies. If an agent takes an action, you know which agent, under which policy, at what time. This is not prompt engineering; it is IAM for non-human principals.&lt;/p&gt;

&lt;p&gt;Agent Registry is a central catalogue of every agent and approved tool across your organisation — the equivalent of a container registry, but for agents. Whether the agent was built internally on ADK or sourced from the partner marketplace (Atlassian, Box, Salesforce, ServiceNow, Workday all launched agents at Next), it has one identity and one index.&lt;/p&gt;

&lt;p&gt;Agent Gateway is described by Kurian as "air traffic control for your agent ecosystem." It routes all agent traffic, speaks both MCP and A2A natively, and applies &lt;strong&gt;Model Armor&lt;/strong&gt; inline: prompt injection scanning and tool poisoning detection happen at the network layer before any agent action executes. Critically, it also surfaces &lt;strong&gt;Agent Anomaly Detection&lt;/strong&gt;, monitoring for tool misuse, unauthorised data access, and reasoning drift in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofu9hatf8xksig5uvhcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofu9hatf8xksig5uvhcp.png" alt="The Govern layer: Agent Identity, Registry, and Gateway" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimise&lt;/strong&gt; closes the loop with Agent Simulation (generate thousands of synthetic interactions to surface edge cases before your users do), Agent Evaluation (multi-turn autoraters scoring live traffic), and OTel-compliant Agent Observability: automatic tracing, Agent Topology visualisation (a live map of how your agents interact with one another and with tools), and turn-key dashboards that surface the full reasoning chain behind every agent decision. If an agent chose the wrong tool or misread a user's intent, you can see exactly which step in the chain caused it — not just that something went wrong, but why. These are the SRE tools for agent fleets.&lt;/p&gt;

&lt;p&gt;Taken together, this is not a developer stack. It is an ops stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five platform changes that make agents genuinely first-class
&lt;/h2&gt;

&lt;p&gt;It is easy to claim that agents are "first-class." The evidence is in whether the platform treats them as principals with rights and identities, not just processes with permissions.&lt;/p&gt;

&lt;p&gt;On that test, five concrete things changed today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw7u5a6meoyrpxgiz92m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw7u5a6meoyrpxgiz92m.png" alt="Five platform changes that make agents first-class citizens" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, agents now have cryptographic identities.&lt;/strong&gt; Agent Identity means IAM, audit, and compliance can treat an agent as a principal rather than an extension of a human user. When an agent in your supply chain pipeline calls a Spanner instance or reads from BigQuery, that action is traceable to a specific agent with a specific policy scope. That is a meaningful governance primitive, not a feature flag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, they route through a dedicated control plane.&lt;/strong&gt; Agent Gateway is effectively an API gateway for agent traffic. Architecturally, this mirrors what happened when enterprises standardised on API gateways a decade ago: a chokepoint that enforces policy, provides observability, and decouples caller from callee. The fact that it speaks MCP and A2A natively means the gateway understands agent semantics, not just HTTP verbs. A Google Cloud engineering post published this month makes the underlying technical case: in agentic protocols, policy attributes live inside message bodies rather than headers, so any governance layer that does not parse MCP and A2A payloads is operating blind. &lt;a href="https://cloud.google.com/blog/products/networking/the-case-for-envoy-networking-in-the-agentic-ai-era?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Envoy&lt;/a&gt;, the proxy underpinning Agent Gateway, is built precisely for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, they have persistent managed memory.&lt;/strong&gt; Memory Bank and Memory Profiles are now managed infrastructure, not application state you build yourself. The Gurunavi case study at Next described eliminating manual searches entirely by having agents recall past preferences across sessions. Payhawk's Financial Controller Agent reduced expense submission time by over 50% by remembering user-specific constraints. Stateful behaviour is no longer something you bolt on; it is something the platform provides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fourth, they have dedicated runtime economics.&lt;/strong&gt; Sub-second cold starts and 300 sandboxes per second on GKE reflect a runtime optimised for agent workload patterns: bursty, parallel, potentially long-running, and needing isolation. The TPU 8i chip (Zebrafish), announced separately today, goes further: designed explicitly for the low-latency, chain-of-thought MoE inference that agent reasoning demands, with roughly 80% better performance-per-dollar than Ironwood on that workload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fifth, they have a dedicated observability and evaluation stack.&lt;/strong&gt; OTel-compliant traces, simulation, and live autorater evaluation give engineers the same observability primitives for agents that SRE tooling gave them for services. You can now run a stress test against your agent fleet before deploying to production, score live traffic, and trace a failed reasoning chain end-to-end. That is the maturity signal.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means if you are building today
&lt;/h2&gt;

&lt;p&gt;The most immediate implication: the boundary between model development and agent operations has moved. A year ago, you deployed a model and called it via an API. Today, you deploy an agent with an identity, a memory profile, a registered set of approved tools, and a gateway policy. The deployment step is the beginning of the operational lifecycle, not the end of the development one.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The deployment step is the beginning of the operational lifecycle, not the end of the development one.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The architectural mental model shift is significant: stop thinking about agents as wrappers around LLM APIs and start thinking about them as microservices — discrete, composable, independently deployable, and governed by the same infrastructure controls as the rest of your stack. ADK is the framework that makes that model practical.&lt;/p&gt;

&lt;p&gt;For engineering leads, the Agent Registry changes the conversation about shadow AI. If every agent your organisation uses — internal or sourced from a partner marketplace — needs to be registered and assigned an identity, you have a forcing function for agent governance that does not depend on policy documentation or developer discipline. The infrastructure enforces it.&lt;/p&gt;

&lt;p&gt;For platform teams, Agent Gateway as an MCP-and-A2A-aware control plane means you can start enforcing tool-level access control at the network layer. Restricting which tools a customer-facing commerce agent can call is now an infrastructure configuration, not a prompt constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  The commerce signal
&lt;/h2&gt;

&lt;p&gt;One customer story from the keynote is worth isolating for what it signals about the direction.&lt;/p&gt;

&lt;p&gt;Macy's unveiled "Ask Macy's," a Gemini-powered shopping agent built in &lt;strong&gt;four weeks&lt;/strong&gt; using Gemini Enterprise for Customer Experience. Reliance demonstrated an agent planning a birthday party, processing millions of product images in minutes via Gemini catalogue enrichment. PayPal's Principal Engineer specifically called out Memory Bank and AP2 (Agent Payments Protocol) as the foundation enabling trusted, agentic commerce experiences on their platform.&lt;/p&gt;

&lt;p&gt;The pattern across all three is the same: agents handling not just product discovery but multi-step, stateful, transactional workflows. An agent that can remember what you bought last month, understand your current budget, recommend products, and initiate a UCP checkout — that requires identity, memory, a governed tool set, and a payment layer that can verify authorisation cryptographically.&lt;/p&gt;

&lt;p&gt;The Gemini Enterprise Agent Platform, announced today, provides the first three. AP2, which Google announced earlier this year and reaffirmed today via the PayPal integration, provides the fourth.&lt;/p&gt;

&lt;p&gt;Commerce is not just a use case for this platform. It is the stress test. If agents can handle a stateful, multi-party, financially consequential transaction with full auditability, they can handle most enterprise workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;If you are building agents on Google Cloud today, the practical advice is simple: register them in Agent Registry, assign them identities, route them through Agent Gateway, and instrument them with OTel traces. The platform now supports that workflow end-to-end. The question is not whether to govern your agents. At this point, the infrastructure assumes you will.&lt;/p&gt;

&lt;p&gt;Tomorrow's developer keynote may add further detail on tool-level governance and Cloud Run specifics for long-running agent workloads. I will update as confirmed.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26" rel="noopener noreferrer"&gt;Google Next 2026 keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Gemini Enterprise Agent Platform product blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Video overview: &lt;a href="https://www.youtube.com/live/j8qW5poBkEU?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;What is Gemini Enterprise Agent Platform?&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/networking/the-case-for-envoy-networking-in-the-agentic-ai-era?utm_campaign=deveco_gdemembers&amp;amp;utm_source=deveco" rel="noopener noreferrer"&gt;Envoy as agentic AI networking foundation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
      <category>vertexai</category>
    </item>
  </channel>
</rss>
