<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shir Meir Lador</title>
    <description>The latest articles on Forem by Shir Meir Lador (@shirmeirlador).</description>
    <link>https://forem.com/shirmeirlador</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596246%2F7fba9a43-cbe3-4af2-adff-1871187ffbf8.jpeg</url>
      <title>Forem: Shir Meir Lador</title>
      <link>https://forem.com/shirmeirlador</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shirmeirlador"/>
    <language>en</language>
    <item>
      <title>Agent Factory Recap: Supercharging Agents on GKE with Agent Sandbox and Pod Snapshots</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:04:00 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-supercharging-agents-on-gke-with-agent-sandbox-and-pod-snapshots-3a5e</link>
      <guid>https://forem.com/googleai/agent-factory-recap-supercharging-agents-on-gke-with-agent-sandbox-and-pod-snapshots-3a5e</guid>
      <description>&lt;p&gt;In the latest episode of the &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;Agent Factory&lt;/a&gt;, Mofi Rahman and I had the pleasure of hosting, Brandon Royal, the PM working on agentic workloads on GKE. We dove deep into the critical questions around the nuances of choosing the right agent runtime, the power of GKE for agents, and the essential security measures needed for intelligent agents to run code.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GKE for Agents?
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=109s" rel="noopener noreferrer"&gt;01:49&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We kicked off our discussion by tackling a fundamental question: why choose GKE as your agent runtime when serverless options like Cloud Run or fully managed solutions like Agent Engine exist?&lt;/p&gt;

&lt;p&gt;Brandon explained that the decision often boils down to control versus convenience. While serverless options are perfectly adequate for basic agents, the flexibility and governance capabilities of Kubernetes and GKE become indispensable in high-scale scenarios involving hundreds or thousands of agents. GKE truly shines when you need granular control over your agent deployments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl08gkxy41hseuy3fljpu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl08gkxy41hseuy3fljpu.png" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ADK on GKE
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=418s" rel="noopener noreferrer"&gt;06:58&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've discussed the &lt;a href="https://www.youtube.com/watch?v=aLYrV61rJG4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=17" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt; in previous episodes, and Mofi highlighted to us how seamlessly it integrates with GKE and even showed a demo with the agent he built. ADK provides the framework for building the agent's logic, traces, and tools, while GKE provides the robust hosting environment. You can containerize your ADK agent, push it to Google Artifact Registry, and deploy it to GKE in minutes, transforming a local prototype into a globally accessible service.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sandbox problem
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=920s" rel="noopener noreferrer"&gt;15:20&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As agents become more sophisticated and capable of writing and executing code, a critical security concern emerges: the risk of untrusted, LLM-generated code. Brandon emphasized that while code execution is vital for high-performance agents and deterministic behavior, it also introduces significant risks in multi-tenant systems. This led us to the concept of a "sandbox."&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Sandbox?
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1158s" rel="noopener noreferrer"&gt;19:18&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those less familiar with security engineering, Brandon clarified that a sandbox provides kernel and network isolation. Mofi further elaborated, explaining that agents often need to execute scripts (e.g., Python for data analysis). Without a sandbox, a hallucinating or prompt-injected model could potentially delete databases or steal secrets if allowed to run code directly on the main server. A sandbox creates a safe, isolated environment where such code can run without harming other systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Sandbox on GKE Demo
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1225s" rel="noopener noreferrer"&gt;20:25&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, how do we build this "high fence" on Kubernetes? Brandon introduced the Agent Sandbox on Kubernetes, which leverages technologies like gVisor, an application kernel sandbox. When an agent needs to execute code, GKE dynamically provisions a completely isolated pod. This pod operates with its own kernel, network, and file system, effectively trapping any malicious code within the gVisor bubble. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexw6cndzjl0w1ybb8mz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexw6cndzjl0w1ybb8mz1.png" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Mofi walked us through a compelling demo of the Agent Sandbox in action.We observed an ADK agent being given a task requiring code execution. As the agent initiated code execution, GKE dynamically provisioned a new pod, visibly labeled as "sandbox-executor," demonstrating the real-time isolation. Brandon highlighted that this pod is configured with strict network policies, further enhancing security.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feauxfwh9kazbqc32u7kz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feauxfwh9kazbqc32u7kz.png" width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: Pod Snapshots
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1779s" rel="noopener noreferrer"&gt;29:39&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While the Agent Sandbox offers incredible security, the latency of spinning up a new pod for every task is a concern. Mofi demoed the game-changing solution: Pod Snapshots. This technology allows us to save their state of running sandboxes and then near-instantly restore them when an agent needs them. Brandon noted that this reduces startup times from minutes to seconds, revolutionizing real-time agentic workflows on GKE.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cfc4k9zczexdby59o0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cfc4k9zczexdby59o0z.png" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It's incredible to see how GKE isn't just hosting agents; it's actively protecting them and making them faster. &lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;Ready to put these concepts into practice? Dive into the full episode to see the demos in action and explore how GKE can supercharge your agentic workloads.&lt;/p&gt;

&lt;p&gt;Learn how to &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/agentic-adk-vertex?utm_campaign=CDR_0x036db2a4_default&amp;amp;utm_medium=external&amp;amp;utm_source=youtube" rel="noopener noreferrer"&gt;deploy an ADK agent to Google Kubernetes Engine&lt;/a&gt; and how to get your run agent to run code safely using the &lt;a href="http://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox" rel="noopener noreferrer"&gt;GKE agent Sandbox&lt;/a&gt;.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Shir Meir Lador → &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mofi Rahman → &lt;a href="https://www.linkedin.com/in/moficodes" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brandon Royal → &lt;a href="https://www.linkedin.com/in/brandonroyal/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Agent Factory Recap: Reinforcement Learning and Fine-Tuning on TPUs</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 31 Mar 2026 18:56:42 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-reinforcement-learning-and-fine-tuning-on-tpus-1o6j</link>
      <guid>https://forem.com/googleai/agent-factory-recap-reinforcement-learning-and-fine-tuning-on-tpus-1o6j</guid>
      <description>&lt;p&gt;In our agent factory holiday special, Don McCasland and I were joined by Kyle Meggs, Senior Product Manager on the TPU Training Team at Google, to dive deep into the world of model fine tuning. We focused specifically on reinforcement learning (RL), and how Google's own infrastructure of TPUs are designed to power these massive workloads at scale.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Consider Fine-Tuning
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=2&amp;amp;t=193s" rel="noopener noreferrer"&gt;3:13&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We started with a fundamental question: with foundational models like Gemini becoming so powerful out of the box, and customization through the prompt can often be good enough, when should you consider fine-tuning? &lt;/p&gt;

&lt;p&gt;Fine tuning your own model is relevant when you need high specialization for unique datasets where a generalist model might not excel (such as in the medical domain), or when you have strict privacy restrictions that require hosting your own models trained on your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Lifecycle: Pre-training and Post-training (SFT and RL)
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=232s" rel="noopener noreferrer"&gt;3:52&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Kyle used a great analogy inspired by Andrej Karpathy to break down the stages of training. He described pre-training as "knowledge acquisition," similar to reading a chemistry textbook to learn how things work. Post-training is further split into Supervised Fine-Tuning (SFT), which is analogous to reading already-solved practice problems within the textbook chapter, and Reinforcement Learning (RL), which is like solving new practice problems without help and then checking your answers in the back of the book to measure yourself against an optimal approach and correct answers. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc192k921af4wed7698x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc192k921af4wed7698x.png" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Reinforcement Learning (RL) is Essential
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=350s" rel="noopener noreferrer"&gt;5:50&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We explored why RL is currently so important for building modern LLMs. Kyle explained that unlike SFT, which is about imitation, RL is about grading actions to drive "alignment." It’s crucial for teaching a model safety (penalizing what not to do), enabling the model to use tools like search and interact with the physical world through trial and error, and for performing verifiable tasks like math or coding by rewarding the entire chain of thought that leads to a correct answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Industry Pulse: Why 2025 is the year of RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=513s" rel="noopener noreferrer"&gt;8:33&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In this segment, we looked at the rapidly evolving landscape of RL. Kyle noted that it is fair to call 2025 the "year of RL," highlighting the massive increase in investment and launches across the industry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;January:&lt;/strong&gt; DeepSeek-R1 launched, making a huge splash with open-source GRPO.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Summer:&lt;/strong&gt; xAI launched Grok 4, reportedly running a 200k GPU cluster for RL at "pre-training scale."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;October:&lt;/strong&gt; A slew of new tooling launches across Google, Meta, and TML.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;November:&lt;/strong&gt; Gemini 3 launched as a premier thinking model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recent:&lt;/strong&gt; Google launched MaxText 2.0 for fine-tuning on TPUs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ud8v71oa92vgbu4iz5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ud8v71oa92vgbu4iz5.png" alt="alt text" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hurdles of Implementing RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=646s" rel="noopener noreferrer"&gt;10:46&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following the industry trends, we discussed why RL is so difficult to implement. Kyle explained that RL combines the complexities of both training and inference into a single process. He outlined three primary challenges: managing infrastructure at the right balance and scale to avoid bottlenecks; choosing the right code, models, algorithms (like GRPO vs. DPO), and data; and finally, the difficulty of integrating disparate components for training, inference, orchestration, and weight synchronization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjca0lpcpo23s95mzv876.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjca0lpcpo23s95mzv876.png" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To provide a solution across these dimensions of complexity, Google offers MaxText, a vertically integrated solution to help you perform RL in a highly scalable and performant fashion. MaxText provides highly optimized models, the latest post-training algorithms, high performance inference via LLM, and powerful scalability/flexibility via Pathways. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rch212bej2n6eck8lq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rch212bej2n6eck8lq8.png" alt="alt text" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In contrast to DIY approaches where users assemble their own stack of disparate components from many different providers, Google’s approach offers a single integrated stack of co-designed components, from &lt;strong&gt;silicon&lt;/strong&gt; to &lt;strong&gt;software&lt;/strong&gt; to &lt;strong&gt;solutions&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctihvw4xt9q6ajs1dfdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctihvw4xt9q6ajs1dfdp.png" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Factory Floor
&lt;/h2&gt;

&lt;p&gt;The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with a live demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TPUs Shine for RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=772s" rel="noopener noreferrer"&gt;12:52&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Before diving into the demo, Kyle explained why TPUs are uniquely suited for complex AI workloads like RL. Unlike other hardware, TPUs were designed system-first. A TPU Pod can connect up to 9,216 chips over low-latency interconnects, allowing for massive scale without relying on standard data center networks. This is a huge advantage for overcoming RL bottlenecks like weight synchronization. Furthermore, because they are purpose-built for AI, they offer superior price-performance and thermal efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitkt61wg3qhq2oobmryd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitkt61wg3qhq2oobmryd.png" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo: Reinforcement Learning (GRPO) with TPU
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=953s" rel="noopener noreferrer"&gt;15:53&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Don led a hands-on demonstration showing what RL looks like in action using Google's infrastructure. The demo showcased:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Using &lt;strong&gt;MaxText 2.0&lt;/strong&gt; as an integrated solution for the workload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leveraging models from MaxText and algorithms from Tunix.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handling inference using vLLM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Utilizing &lt;strong&gt;Pathways&lt;/strong&gt; for orchestration and scaling to run GRPO (Group Relative Policy Optimization).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4tqmo8zv62i6oufqj8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4tqmo8zv62i6oufqj8q.png" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This holiday special was a great deep dive into the cutting edge of model fine tuning. While foundational models are getting better every day, the future of highly specialized, capable agents relies on mastering post-training techniques like RL, and having the right vertically integrated infrastructure, like TPUs, to run them efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;We hope this episode gave you valuable tools and perspectives to think about fine-tuning your own specialized agents. Be sure to check out the resources below to explore MaxText 2.0 and start experimenting with TPUs for your workloads. We'll see you next year for a revamped season of The Agent Factory!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;Post-Training Docs &lt;a href="https://maxtext.readthedocs.io/en/latest/tutorials/post_training_index.html" rel="noopener noreferrer"&gt;https://maxtext.readthedocs.io/en/latest/tutorials/post_training_index.html&lt;/a&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Google Cloud TPU (Ironwood) Documentation: &lt;a href="https://docs.cloud.google.com/tpu/docs/tpu7x" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/tpu/docs/tpu7x&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Google Cloud open source code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MaxText - &lt;a href="https://github.com/AI-Hypercomputer/maxtext" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/maxtext&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GPU recipes - &lt;a href="https://github.com/AI-Hypercomputer/gpu-recipes" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/gpu-recipes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;TPU recipes - &lt;a href="https://github.com/AI-Hypercomputer/tpu-recipes" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/tpu-recipes&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Andrej Karpathy - Chemistry Analogy: &lt;a href="https://youtu.be/7xTGNNLPyMI?si=Bubrqz_dPpvuqc1M&amp;amp;t=8069" rel="noopener noreferrer"&gt;Deep Dive into LLMs like ChatGPT&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Paper: "Small Language Models are the Future of Agentic AI" (Nvidia): &lt;a href="https://arxiv.org/abs/2506.02153" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://arxiv.org/abs/2506.02153" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2506.02153&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Fine tuning blog: &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805" rel="noopener noreferrer"&gt;https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Shir Meir Lador →  &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/shirmeirlador/&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Don McCasland →  &lt;a href="https://www.linkedin.com/in/donald-mccasland/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/donald-mccasland/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kyle Meggs → &lt;a href="https://www.linkedin.com/in/kyle-meggs/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/kyle-meggs/&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>gemini</category>
    </item>
    <item>
      <title>My First Experience Creating Antigravity Skills</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Fri, 20 Mar 2026 15:23:02 +0000</pubDate>
      <link>https://forem.com/googleai/my-first-experience-creating-antigravity-skills-524b</link>
      <guid>https://forem.com/googleai/my-first-experience-creating-antigravity-skills-524b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cvbil990snohnuztk9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cvbil990snohnuztk9w.png" width="700" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Experimenting with Agent skills for the first time, feeling empowered!&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;br&gt;
Last week, I was at an event where we taught developers how to build &lt;a href="https://goo.gle/aaiwcr-1" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt;, &lt;a href="http://goo.gle/aaiwcr-2" rel="noopener noreferrer"&gt;agents&lt;/a&gt;, and &lt;a href="http://goo.gle/aaiwcr-3" rel="noopener noreferrer"&gt;deploy open models&lt;/a&gt; to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud Run&lt;/a&gt;. After the session, one of the developers shared something that really stuck with me: he was already using our content to create specialized &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;&lt;strong&gt;Skills&lt;/strong&gt;&lt;/a&gt; to share with his entire team.&lt;/p&gt;

&lt;p&gt;I got inspired and decided it was time to dive into &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt;. During my last project, the dev-signal agent, I had a lot of learnings about how to bring agents and AI applications to production in a robust and scalable manner. I thought, &lt;em&gt;this is a great opportunity to give my favorite coding agent, Google’s &lt;a href="https://www.antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; (Google’s “agent-first” IDE), those skills so that going forward, it will just do it for me!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through how I built the 13 production skills in this &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal/.agent/skills" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and the patterns behind them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Agent Skills?
&lt;/h2&gt;

&lt;p&gt;As &lt;a href="https://www.linkedin.com/in/iromin/?originalSubdomain=in" rel="noopener noreferrer"&gt;Romin Irani&lt;/a&gt; explains in &lt;a href="https://medium.com/google-cloud/tutorial-getting-started-with-antigravity-skills-864041811e0d" rel="noopener noreferrer"&gt;“Getting Started with Google Antigravity Skills”&lt;/a&gt;, skills represent a shift from monolithic context loading to &lt;strong&gt;Progressive Disclosure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Agents get “overwhelmed” when providing them too many tools all at once (a phenomenon known as “&lt;a href="https://www.linkedin.com/posts/smithakolan_your-ai-agent-is-not-bad-at-reasoning-activity-7422342915089178624-awR3?rcm=ACoAAAYeeDsBfJzKJQaDuSjRnUBmKV20OJV2olc" rel="noopener noreferrer"&gt;Tool Bloat&lt;/a&gt;”), to solve for that, Skills allow the agent to “load” specialist knowledge only when needed. When you ask an agent to “evaluate a shadow revision,” it will figure out it will need to leverage the &lt;strong&gt;Shadow Deployer&lt;/strong&gt; skill as context for this operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workspace vs. Global Scope
&lt;/h2&gt;

&lt;p&gt;In Antigravity, you can manage these skills in two distinct ways depending on how you want to use them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workspace Scope:&lt;/strong&gt; Located in &lt;em&gt;.agent/skills/&lt;/em&gt; within your project root. These are specific to your project and can be committed to GitHub so your entire team can benefit from the same production patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global Scope:&lt;/strong&gt; Located in &lt;em&gt;~/.gemini/antigravity/skills/.&lt;/em&gt; These are your personal utilities that stay with you across every project you work on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I built the skills
&lt;/h2&gt;

&lt;p&gt;Following the principles in &lt;a href="https://www.linkedin.com/in/petruzalek/" rel="noopener noreferrer"&gt;Daniela Petruzalek&lt;/a&gt;’s &lt;a href="https://medium.com/google-cloud/building-agent-skills-with-skill-creator-855f18e785cf" rel="noopener noreferrer"&gt;“Building Agent Skills with skill-creator”,&lt;/a&gt; I took a “methodology-first” approach. I used the existing dev-signal blog series I’ve been working on and the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;codebase&lt;/a&gt; itself as core context, asking Antigravity to identify and codify the unique skills needed to &lt;strong&gt;build a production agent on Google Cloud.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For some of the more specialized areas, I provided additional context with patterns I’d like to follow, such as the agent evaluation &lt;a href="https://codelabs.devsite.corp.google.com/codelabs/production-ready-ai-roadshow/2-evaluating-multi-agent-systems/evaluating-multi-agent-systems#0" rel="noopener noreferrer"&gt;codelab&lt;/a&gt; and &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/from-vibe-checks-to-continuous-evaluation-engineering-reliable-ai-agents?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt; and the agent security &lt;a href="https://codelabs.developers.google.com/codelabs/production-ready-ai-roadshow/3-securing-a-multi-agent-system/securing-a-multi-agent-system#0?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;codelab&lt;/a&gt;, both written by my awesome team.&lt;/p&gt;

&lt;p&gt;These 13 skills provide Antigravity (or any developer using them) the crucial toolkit of a Google Cloud Production Engineer. I’m currently finalizing a detailed, step-by-step walkthrough of the dev-signal agent which will be published on the &lt;a href="https://cloud.google.com/blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Blog&lt;/strong&gt;&lt;/a&gt; very soon! (follow me for future updates)&lt;/p&gt;

&lt;p&gt;In the meantime, you don’t have to wait — the full &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal/.agent/skills" rel="noopener noreferrer"&gt;skills&lt;/a&gt; are available for you to explore and leverage in your own projects today.&lt;/p&gt;

&lt;p&gt;Here is the full inventory of the skills:&lt;/p&gt;

&lt;h2&gt;
  
  
  🏗️ Production Agent
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;adk-memory-bank-initializer:&lt;/strong&gt; Long-term state logic with Vertex AI Memory Bank.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;agent-containerizer:&lt;/strong&gt; Mixed-runtime Dockerfiles (Python + Node.js).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cloud-run-agent-architect:&lt;/strong&gt; Least-privilege Terraform for Cloud Run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-production-secret-handler:&lt;/strong&gt; In-memory secret fetching pattern (Secret Manager).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mcp-connector-generator:&lt;/strong&gt; Standardized MCP connection logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📊 Evaluation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-eval-engine-runner:&lt;/strong&gt; Parallel inference and reasoning trace capture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-eval-metric-configurator:&lt;/strong&gt; Setup for Grounding and Tool Use rubrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-golden-dataset-builder:&lt;/strong&gt; Tools for building datasets with reference trajectories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-shadow-deployer:&lt;/strong&gt; “Dark Canary” deployment scripts with revision tagging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-tool-trajectory-evaluator:&lt;/strong&gt; Custom Python metrics for Precision and Recall.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🛡️ Security
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-model-armor-shield:&lt;/strong&gt; Intelligent firewall (Prompt Injection, RAI, Malicious URL filters).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-safety-gatekeeper:&lt;/strong&gt; Python integration pattern (safety_util.py) for sanitizing user inputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-sdp-template-factory:&lt;/strong&gt; Terraform for Sensitive Data Protection (PII/Secret redaction).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By codifying these patterns to production skills, Antigravity can now leverage these automatically in my day to day development. I hope you find these as helpful as I do!&lt;/p&gt;

&lt;h2&gt;
  
  
  Pro tip - self improving skills!
&lt;/h2&gt;

&lt;p&gt;Because these skills were AI-generated, they might not work perfectly for your specific environment on the first try. But that’s actually the best part of working with an agentic IDE. If a skill doesn’t work well for you, don’t just manually fix the code, let the coding agent figure it out. Once it finds the solution, you can ask it to update the corresponding SKILL.md with the learned workflow. This will capture the corrected workflow for the future, ensuring the agent doesn’t repeat the mistake while saving you tokens and time on the next run. Think of these as living documents that actively improve as you build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to get started?&lt;/strong&gt; Clone the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and add these skills to your Workspace or Global Scope to start building your own production-ready agents. Learn more about &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;Agent skills.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Follow me on &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt; for updates on my next blogs and videos.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>ai</category>
      <category>googlecloud</category>
      <category>agents</category>
    </item>
    <item>
      <title>How I Turned an Ugly Spreadsheet into an AI Assisted App with Antigravity</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Wed, 18 Feb 2026 17:39:12 +0000</pubDate>
      <link>https://forem.com/googleai/how-i-turned-an-ugly-spreadsheet-into-an-ai-assisted-app-with-antigravity-3j52</link>
      <guid>https://forem.com/googleai/how-i-turned-an-ugly-spreadsheet-into-an-ai-assisted-app-with-antigravity-3j52</guid>
      <description>&lt;p&gt;&lt;strong&gt;I have a confession to make.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Up until now, I wasn’t that much into “vibe coding.” I used AI all the time for Python coding, but I never really built a whole app from scratch in a language I knew nothing about.&lt;/p&gt;

&lt;p&gt;That changed today. I encountered a really annoying problem: I had to review a massive amount of talk submissions for a conference. We’re talking about a massive spreadsheet. Staring at those tiny cells was literally making my eyes hurt.&lt;/p&gt;

&lt;p&gt;My initial thought was, “Hey, let’s create a really sharp UI for the submission review.” But then I thought, why stop there? Why not let AI provide me valuable inputs from social media to help me with the review itself?&lt;/p&gt;

&lt;p&gt;So, I decided to build &lt;strong&gt;TalkScout&lt;/strong&gt;. And since I wanted to test drive &lt;a href="https://antigravity.google/docs/home" rel="noopener noreferrer"&gt;Google Antigravity&lt;/a&gt; (Google’s new AI-powered coding agent), I figured this was the perfect opportunity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftvcnagk5wbmvmw2dxxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftvcnagk5wbmvmw2dxxt.png" alt="talkscout dashboard" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Talkscout Dashboard (synthetic data)&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Here is how I went from a painful CSV to a fully deployed &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; app-without writing a single line of React code myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: The “Meta-Prompt” (Asking Gemini to Talk to Antigravity)
&lt;/h2&gt;

&lt;p&gt;I didn’t start by coding; I started by chatting. I used &lt;strong&gt;meta-prompting&lt;/strong&gt; to get started.&lt;/p&gt;

&lt;p&gt;So, what is meta-prompting, you may ask? It’s actually when you go to Gemini 3 and ask it to write the prompt for the coding agent.&lt;/p&gt;

&lt;p&gt;I explained my problem to &lt;strong&gt;Gemini 3&lt;/strong&gt; in simple words. Gemini 3 acted as my architect. It turned my “brain dump” requirements into a technical spec, defining the component structure and data model. I didn’t have to guess the right words, I just pasted that polished spec into Antigravity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Ditching the Spreadsheet for a Dashboard
&lt;/h2&gt;

&lt;p&gt;With that prompt, Antigravity built the app of my dreams. It allowed me to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload the CSV with all the conference talks.&lt;/li&gt;
&lt;li&gt;Get a dashboard showing the status of each talk.&lt;/li&gt;
&lt;li&gt;See a beautiful, high-contrast UI to review abstracts and demo plans without squinting at cells.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv05d3jhgbptocmquud4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv05d3jhgbptocmquud4.png" alt="TalkScout submission review page with high contrast UI" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;TalkScout submission review page with high contrast UI&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;strong&gt;The “Vibe” Fix:&lt;/strong&gt; It wasn’t all smooth sailing — I actually hit a nasty React hydration error. This can take hours to debug, especially if you’re not a frontend developer… But I simply provided the error message to Antigravity and the coding agent pinpointed the mismatch in the DOM and fixed it in minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Integrating Grounded Intelligence
&lt;/h2&gt;

&lt;p&gt;I didn’t just want a UI; I wanted to overcome my own bias. How do I know if a niche topic is actually hot?&lt;/p&gt;

&lt;p&gt;I added a button to get an &lt;strong&gt;AI Assessment&lt;/strong&gt;. But I didn’t want hallucinations. I used &lt;strong&gt;Google Search Grounding&lt;/strong&gt; so the AI could search through Reddit, X (Twitter), and LinkedIn for real-world developer signals. That provided me inputs based on the current developer audience mindshare.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftck7aytx9ecgnlrifq08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftck7aytx9ecgnlrifq08.png" alt="TalkScout submission review page with AI social media analysis" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;TalkScout submission review page with AI social media analysis&lt;/small&gt;&lt;/center&gt;

&lt;h2&gt;
  
  
  Step 4: Calibrating the “Strict” Reviewer
&lt;/h2&gt;

&lt;p&gt;Initially, the AI was way too nice. It was giving high scores to anything with trendy keywords.&lt;/p&gt;

&lt;p&gt;I used what’s called &lt;strong&gt;few-shot prompting&lt;/strong&gt; to calibrate it. I gave examples of my scores vs. its scores and introduced what I call the &lt;strong&gt;“Marketing Fluff Penalty”&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a submission reads like a documentation/marketing page? Points docked.&lt;/li&gt;
&lt;li&gt;If the submission was way too short? We capped the score at a hard 2.&lt;/li&gt;
&lt;li&gt;If it includes war stories and actual learnings — increase rating.
After a few examples, it became more calibrated to my taste.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: The Pivot to Batch Mode
&lt;/h2&gt;

&lt;p&gt;I realized it was taking me too long to ask the AI to evaluate each talk individually while I reviewed it.&lt;/p&gt;

&lt;p&gt;So, I asked Antigravity to refactor the backend for &lt;strong&gt;Batch Mode&lt;/strong&gt;. Now, TalkScout processes the entire submission pool in the background. By the time I grab a coffee, the “AI Draft” column is full of insights, allowing me to focus only on the final decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Sharing the Goodness (Deploy to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;)
&lt;/h2&gt;

&lt;p&gt;TalkScout was working great for me, but I thought, “It would be great to share this with the other reviewers.”&lt;/p&gt;

&lt;p&gt;This is where Antigravity really showed off. I simply asked it to deploy the app. It automatically recognized my Google Cloud Project ID, handled the containerization, generated the exact deployment commands, and deployed it to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One simple ask, and minutes later, I had a URL to share with the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  It Was Pretty Fun!
&lt;/h2&gt;

&lt;p&gt;It was pretty fun to actually solve a real problem I had using Antigravity and vibe coding. I built a tool that handles ingestion, provides a distraction-free rating interface, and provides valuable inputs for my reviews.&lt;/p&gt;

&lt;p&gt;I would love to hear from you all - have you recently solved a problem using vibe coding?&lt;/p&gt;

&lt;p&gt;If you haven’t already - try playing around with &lt;a href="https://antigravity.google/docs/home" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; and easily deploy your apps to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Decoding high-bandwidth memory: A practical guide to GPU memory for fine-tuning AI models</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 15 Jan 2026 15:27:00 +0000</pubDate>
      <link>https://forem.com/googleai/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models-56af</link>
      <guid>https://forem.com/googleai/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models-56af</guid>
      <description>&lt;p&gt;We've all been there. You've meticulously prepared your dataset and written your training script. You hit &lt;strong&gt;run&lt;/strong&gt;, and your excitement builds, only to be crushed by the infamous error: CUDA out of memory.&lt;/p&gt;

&lt;p&gt;This is one of the most common roadblocks in AI development. Your GPU's &lt;a href="https://en.wikipedia.org/wiki/High_Bandwidth_Memory" rel="noopener noreferrer"&gt;High Bandwidth Memory (HBM)&lt;/a&gt;, is the high-speed memory that holds everything that's needed for computation, and running out of it is a hard stop. But how do you know how much you need?&lt;/p&gt;

&lt;p&gt;To build a clear foundation, we'll start by breaking down the HBM consumers on a single GPU and we'll present key strategies to reduce HBM consumption on a single GPU. Later, we'll explore advanced multi-GPU strategies like data and &lt;a href="https://huggingface.co/docs/transformers/v4.13.0/en/parallelism" rel="noopener noreferrer"&gt;model parallelism&lt;/a&gt; that can help relieve memory pressure and scale your training in the cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding HBM: What's using all the memory?
&lt;/h2&gt;

&lt;p&gt;When you fine-tune a model, your HBM is primarily consumed by three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.webopedia.com/technology/llm-tokens-weights-parameters/#:~:text=in%20various%20contexts.-,What%20are%20LLM%20Weights?,or%20generate%20coherent%2C%20meaningful%20responses." rel="noopener noreferrer"&gt;Model Weights&lt;/a&gt;:&lt;/strong&gt; This is the most straightforward. It's the storage space required for the model's parameters—the "brain" that it uses to make predictions. A 7-billion parameter model loaded in 16-bit precision will take up roughly 14 GB before you even process a single piece of data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://eureka.patsnap.com/article/what-is-the-optimizer-state-in-deep-learning-training" rel="noopener noreferrer"&gt;Optimizer States&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Gradient_descent" rel="noopener noreferrer"&gt;Gradients&lt;/a&gt;:&lt;/strong&gt; This is the overhead that's required for learning. To update the model's weights, the training process needs to calculate gradients (the direction of learning) and the &lt;a href="https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/#Adam_Deep_Learning_Optimizer" rel="noopener noreferrer"&gt;optimizer&lt;/a&gt; (like the popular &lt;a href="https://docs.pytorch.org/docs/stable/generated/torch.optim.AdamW.html" rel="noopener noreferrer"&gt;AdamW&lt;/a&gt;) needs to store its own data to guide the training. In full fine-tuning, this can be the largest consumer of HBM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Activation_function" rel="noopener noreferrer"&gt;Activations&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Online_machine_learning#Batch_learning" rel="noopener noreferrer"&gt;Batch Data&lt;/a&gt;:&lt;/strong&gt; This is the most dynamic part. When your data (images, text, etc.) flows through the model's layers, the intermediate calculations, or activations, are stored in HBM. The memory needed here is directly proportional to your batch size. A larger batch size means more activations are stored simultaneously, which leads to faster training but much higher memory usage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; These calculations are theoretical minimums. Real-world frameworks add up to 30% overhead due to &lt;a href="https://arxiv.org/abs/1910.02054" rel="noopener noreferrer"&gt;temporary buffers, kernel launches, and memory fragmentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although it's impossible to get a perfect number without experimentation, you can estimate your HBM needs with this general formula:&lt;br&gt;
&lt;em&gt;&lt;center&gt;Total HBM ≈ (Model Size) + (Optimizer States) + (Gradients) + (Activations)&lt;/center&gt;&lt;/em&gt;&lt;br&gt;
 &lt;br&gt;
&lt;strong&gt;Further reading:&lt;/strong&gt; See this excellent JAX e-book that covers &lt;a href="https://jax-ml.github.io/scaling-book/gpus/" rel="noopener noreferrer"&gt;these topics&lt;/a&gt; in great detail and even has some &lt;a href="https://jax-ml.github.io/scaling-book/gpus/#quiz-5-llm-rooflines" rel="noopener noreferrer"&gt;"try it out yourself" test questions&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Why full fine-tuning is so demanding
&lt;/h2&gt;

&lt;p&gt;To see why running out of memory is such a common problem, let's walk through a real-world example that I recently worked on: fine-tuning the &lt;a href="https://deepmind.google/models/gemma/medgemma/" rel="noopener noreferrer"&gt;medgemma-4b-it model&lt;/a&gt;, which has 4 billion parameters. Our &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification" rel="noopener noreferrer"&gt;script&lt;/a&gt; loads it in bfloat16 precision (2 bytes per parameter).&lt;/p&gt;

&lt;p&gt;First, let's calculate the static HBM footprint. This is the memory that's required just to load the model and prepare it for training, before you've even processed a single piece of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Model Size:&lt;/strong&gt; The memory that's needed to simply hold the model on the GPU.&lt;/p&gt;

&lt;center&gt;4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Gradients and Optimizer States:&lt;/strong&gt; The overhead for training every parameter with the AdamW optimizer.&lt;/p&gt;

&lt;center&gt;Gradients: 4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;center&gt;Optimizer States (AdamW): 2 × 4 billion parameters × 2 bytes/parameter = 16 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; While AdamW is a popular optimizer, other optimizers, such as Adafactor and Lion, have different memory footprints.&lt;/p&gt;

&lt;p&gt;Adding these together gives us our baseline HBM cost for a full fine-tuning attempt:&lt;/p&gt;

&lt;center&gt;8 GB (Model) + 8 GB (Gradients) + 16 GB (Optimizer) = 32 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;This 32 GB is the baseline just to start the training process. On top of this, the GPU needs &lt;strong&gt;additional memory for activations&lt;/strong&gt;, which is a &lt;em&gt;dynamic&lt;/em&gt; cost that grows with your batch size and input data size. This is why full fine-tuning of large models is so demanding and often reserved for the most powerful hardware.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key strategies to reduce HBM consumption
&lt;/h2&gt;

&lt;p&gt;The HBM requirement for a full fine-tune can seem impossibly high. But several powerful techniques can reduce memory consumption, making it feasible to train large models on consumer-grade or entry-level professional GPUs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Parameter-Efficient Fine-Tuning (PEFT) with LoRA
&lt;/h3&gt;

&lt;p&gt;Instead of training all the billions of parameters in a model, &lt;a href="https://huggingface.co/docs/peft/en/index" rel="noopener noreferrer"&gt;Parameter-Efficient Fine-Tuning (PEFT)&lt;/a&gt; methods focus on training only a small subset of parameters. The most popular of these is &lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;LoRA (Low-Rank Adaptation)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/lora-qlora?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;LoRA&lt;/a&gt; works by freezing &lt;strong&gt;the original model's weights and injecting a tiny number of new, trainable &lt;em&gt;adapter&lt;/em&gt; layers&lt;/strong&gt; into the model architecture. This means the memory-hungry gradients and optimizer states are only needed for these few million new parameters, not the full 4 billion.&lt;/p&gt;
&lt;h4&gt;
  
  
  The math behind LoRA's memory savings
&lt;/h4&gt;

&lt;p&gt;LoRA doesn't remove the base model from your GPU. The full 8 GB of the original model's weights are still loaded and taking up HBM. They're just frozen, which means that the GPU isn't training them. All of the memory savings come from the fact that you no longer need to store the huge gradients and optimizer states for that massive, frozen part of the model.&lt;/p&gt;

&lt;p&gt;Let's recalculate the static HBM footprint with LoRA, assuming it adds 20 million trainable parameters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Model Size (unchanged):&lt;/strong&gt; The base model is still loaded.&lt;/p&gt;

&lt;center&gt;4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. LoRA Gradients &amp;amp; Optimizer States:&lt;/strong&gt; We now only need overhead for the tiny set of new parameters.&lt;/p&gt;

&lt;center&gt;Gradients: 20 million parameters × 2 bytes/parameter = 40 MB&lt;/center&gt;

&lt;center&gt;
Optimizer States: 2 × 20 million parameters × 2 bytes/parameter = 80 MB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;The new static HBM footprint is now:&lt;/p&gt;

&lt;center&gt;8 GB (Model) + 40 MB (Gradients) + 80 MB (Optimizer) ≈ 8.12 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;The training overhead has shrunk from 24 GB to just 120 MB. Your new baseline memory requirement is now just over 8 GB. This lower baseline memory requirement leaves much more room for the dynamic memory that's needed for activations, which lets you use a reasonable batch size on a common 16 GB or 24 GB GPU without running out of memory.&lt;/p&gt;
&lt;h3&gt;
  
  
  Model quantization
&lt;/h3&gt;

&lt;p&gt;Besides training fewer parameters, we can also shrink the ones that we have by using &lt;a href="https://huggingface.co/docs/optimum/en/concept_guides/quantization" rel="noopener noreferrer"&gt;quantization&lt;/a&gt;, which involves reducing the &lt;a href="https://arxiv.org/html/2410.13857v1" rel="noopener noreferrer"&gt;numerical precision&lt;/a&gt; of the model's weights. The standard precision for modern training is &lt;a href="https://en.wikipedia.org/wiki/Bfloat16_floating-point_format" rel="noopener noreferrer"&gt;bfloat16&lt;/a&gt; because it offers the dynamic range of float32 with half the memory footprint. But we can reduce HBM usage further by converting weights to lower-precision integer formats like int8 or int4.&lt;/p&gt;

&lt;p&gt;Using lower-precision integer formats has a significant impact on HBM when compared to the standard bfloat16 baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;bfloat16 (standard):&lt;/strong&gt; The baseline size (e.g., a 7B model requires &lt;strong&gt;~14 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8-bit precision:&lt;/strong&gt; Halves the model size (e.g., 14 GB becomes &lt;strong&gt;~7 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4-bit precision:&lt;/strong&gt; Reduces the model size by a factor of 4 (e.g., 14 GB becomes &lt;strong&gt;~3.5 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reduction in size lets you fit much larger models into memory with minimal degradation in performance.&lt;/p&gt;


&lt;div class="crayons-card c-embed"&gt;

  

&lt;p&gt;&lt;strong&gt;A word of warning from experience:&lt;/strong&gt;&lt;br&gt;
When I started experimenting in this area, my first attempt to load the model using the common float16 data type failed spectacularly. The model's outputs were NaN, and a quick check revealed that every internal value had collapsed into NaN (Not a Number) .&lt;/p&gt;

&lt;p&gt;The culprit was a classic &lt;a href="https://en.wikipedia.org/wiki/Integer_overflow" rel="noopener noreferrer"&gt;numerical overflow&lt;/a&gt;. The float16 data type has a tiny numerical range and it can't represent any number larger than 65,504. During training, intermediate values can easily exceed this limit, causing an overflow that creates a NaN. The fix was a simple one-line change to bfloat16, which has a massive numerical range that prevents these overflows and keeps training stable. For fine-tuning large models, always prefer bfloat16 for stability.&lt;/p&gt;


&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;&lt;a href="https://arxiv.org/abs/2305.14314" rel="noopener noreferrer"&gt;Combining LoRA and Quantization:&lt;/a&gt;&lt;/strong&gt; These techniques work best together. Quantized LoRA (QLoRA) is a method that stores the massive base model in a highly efficient 4-bit format (specifically NF4 or NormalFloat 4), while adding small, trainable LoRA adapters in bfloat16. During the training process, the 4-bit weights are dequantized to bfloat16 for computation. Dequantizing in process lets you fine-tune very large models on a single GPU with the memory savings of 4-bit storage and the mathematical stability of 16-bit training.&lt;/p&gt;

&lt;h3&gt;
  
  
  FlashAttention: An algorithmic speed boost
&lt;/h3&gt;

&lt;p&gt;Finally, &lt;a href="https://arxiv.org/abs/2205.14135" rel="noopener noreferrer"&gt;FlashAttention&lt;/a&gt; is a foundational algorithmic optimization that significantly reduces HBM usage and speeds up training on both single and multi-GPU setups. The attention mechanism in transformers is a primary memory bottleneck because it requires storing a large, intermediate &lt;a href="https://en.wikipedia.org/wiki/Attention_%28machine_learning%29" rel="noopener noreferrer"&gt;attention matrix&lt;/a&gt;. FlashAttention cleverly reorders the computation to avoid storing this full matrix in memory, leading to substantial memory savings and faster execution.&lt;/p&gt;

&lt;p&gt;Best of all, enabling FlashAttention is often as simple as a one-line change. In the MedGemma fine-tuning script, this was done by setting the value &lt;code&gt;attn_implementation="sdpa"&lt;/code&gt;, which can automatically use more efficient backends like FlashAttention if the hardware supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling beyond a single GPU: Advanced strategies
&lt;/h2&gt;

&lt;p&gt;Techniques like LoRA and quantization are useful for lowering HBM needs on a single GPU. But to train truly massive models or to really speed up the process, you'll eventually need to scale out to multiple GPUs. Here are some of the key strategies that can be used to distribute the load and overcome memory limitations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data parallelism
&lt;/h3&gt;

&lt;p&gt;Data parallelism is the most common and intuitive approach to scaling. In a Distributed Data Parallel (DDP) setup, the entire model is replicated on each GPU. The key is that the global batch of training data is split, with each GPU processing its own mini-batch concurrently. After each forward and backward pass, the gradients from each GPU are averaged together to ensure that all of the model replicas learn from the entire dataset and they stay in sync. This method is excellent for &lt;strong&gt;speeding up training&lt;/strong&gt; but it &lt;strong&gt;doesn't reduce the HBM&lt;/strong&gt; that's required to hold the model itself, because every GPU needs a full copy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model parallelism
&lt;/h3&gt;

&lt;p&gt;When a model is too large to fit into the memory of a single GPU, you must use &lt;a href="https://en.wikipedia.org/wiki/Data_parallelism#Data_parallelism_vs._model_parallelism" rel="noopener noreferrer"&gt;model parallelism&lt;/a&gt;. Instead of replicating the model, this strategy &lt;strong&gt;splits the model&lt;/strong&gt; across multiple GPUs. There are two primary ways to do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/tensor_parallelism" rel="noopener noreferrer"&gt;Tensor parallelism&lt;/a&gt;:&lt;/strong&gt; This method splits a single large operation (like a massive weight matrix in a transformer layer) across several GPUs. Each GPU computes its part of the operation, and the results are combined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.pytorch.org/docs/stable/distributed.pipelining.html" rel="noopener noreferrer"&gt;Pipeline parallelism&lt;/a&gt;:&lt;/strong&gt; This technique places different layers of the model onto different GPUs in a sequence. The data flows through the first set of layers on GPU 1, then the output is passed to GPU 2 for the next set of layers, and so on, like an assembly line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These strategies are more complex to implement than data parallelism, but they're essential for models that are simply too big for one device.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Sharded Data Parallelism (FSDP)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html" rel="noopener noreferrer"&gt;FSDP&lt;/a&gt; is a powerful and efficient hybrid strategy that combines the ideas of &lt;strong&gt;data parallelism&lt;/strong&gt; and &lt;strong&gt;model parallelism&lt;/strong&gt;. Unlike standard data parallelism where each GPU holds a full copy of the model, optimizer states, and gradients, FSDP shards (or splits) all of these components across the GPUs. Each GPU only materializes the full parameters for the &lt;strong&gt;specific layer&lt;/strong&gt; that it's computing at that moment, &lt;strong&gt;dramatically reducing the peak HBM&lt;/strong&gt; usage per device. FSDP makes it possible to train enormous models on a cluster of smaller GPUs.&lt;/p&gt;

&lt;p&gt;By combining these hardware and software strategies, you can &lt;strong&gt;scale your fine-tuning jobs&lt;/strong&gt; from a single GPU to a &lt;strong&gt;powerful, distributed cluster&lt;/strong&gt; capable of handling even the most demanding AI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  HBM sizing guide
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HBM&lt;/th&gt;
&lt;th&gt;Use case and explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Sufficient for basic inference or fine-tuning with techniques like LoRA using a very small batch size (e.g., 1-2). Expect slower training times at this level.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;The recommended starting point for a good experience with 4-7 B parameter models. This capacity allows for a more effective batch size (e.g., 8-16) when using LoRA, providing a great balance of training speed and cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40+ GB&lt;/td&gt;
&lt;td&gt;Necessary for maximizing training speed with large batch sizes or for working with larger models (in the 20+ B parameter range) now or in the future.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Encountering the CUDA out of memory error provides an important lesson in the trade-offs between model size, training techniques, and batch size. By understanding what consumes your HBM, you can make smarter decisions and keep your projects running smoothly.&lt;/p&gt;

&lt;p&gt;I hope that this guide has helped clarify the CUDA out of memory error and that it's given you the tools to keep your projects running smoothly. When you're ready to take the next step, Google Cloud has the tools to accelerate your AI development.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore &lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GPU configurations for your Cloud Run services&lt;/a&gt; and best practices for running &lt;a href="https://cloud.google.com/run/docs/configuring/jobs/gpu-best-practices?hl=en&amp;amp;utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run jobs with GPU&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For maximum control: Spin up a &lt;a href="https://cloud.google.com/products/compute" rel="noopener noreferrer"&gt;Compute Engine&lt;/a&gt; instance with the latest NVIDIA H100 or A100 Tensor Core GPUs and take full control of your environment.&lt;/li&gt;
&lt;li&gt;Looking to optimize your model hosting infrastructure? Take a look at &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/vllm-performance-tuning-the-ultimate-guide-to-xpu-inference-configuration?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;The Ultimate Guide to xPU Inference Configuration&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For a deeper dive into scaling your model, check out &lt;a href="https://jax-ml.github.io/scaling-book" rel="noopener noreferrer"&gt;How to Scale Your Model&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;New to Google Cloud? Get started with the $300 free credit to find the perfect solution for your next project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to Jason Monden and Sayce Falk from the AI compute team for their helpful review and feedback on this post.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gpu</category>
      <category>performance</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Agent Factory Recap: Can you do my shopping?</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Fri, 19 Dec 2025 19:44:58 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-can-you-do-my-shopping-5f8k</link>
      <guid>https://forem.com/googleai/agent-factory-recap-can-you-do-my-shopping-5f8k</guid>
      <description>&lt;p&gt;In episode #8 of &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;The Agent Factory&lt;/a&gt;, Ivan Nardini and I are joined by Prateek Dudeja, product manager from the Agent Payment Protocol Team, to dive into one of the biggest hurdles for &lt;a href="https://cloud.google.com/discover/what-are-ai-agents?e=48754805&amp;amp;hl=en&amp;amp;utm_campaign=CDR_0x6e136736_awareness_b446653415&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;AI agents&lt;/a&gt; in eccomerce: trust, especially when it comes to money.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Agent Payment Protocol
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=103s" rel="noopener noreferrer"&gt;01:43&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What if an agent could buy concert tickets for you at a specific time that the tickets go on sale. You don't want to miss out! Maybe you want two tickets, and you don't want to spend more than $200. You definitely want to sit in a section with a great view of the stage. To have an agent act as your ticket buyer, you would have to trust that agent with all facets of your request and your credit card. How can you be sure that the agent won't buy 200 tickets or that it won't charge you for a lifetime supply of rubber duckies?&lt;/p&gt;

&lt;p&gt;The potential for a messy outcome with this concert ticket request provides insight into a "&lt;strong&gt;Crisis of Trust&lt;/strong&gt;" that can hold back agentic commerce. The good news is there's a way to move forward and build trust. &lt;/p&gt;

&lt;p&gt;To solve the "Crisis of Trust," Google introduced the &lt;a href="https://github.com/google-agentic-commerce/AP2" rel="noopener noreferrer"&gt;Agent Payment Protocol (AP2)&lt;/a&gt;, a new open standard. It's not a new payment system; it’s a "&lt;strong&gt;trust layer&lt;/strong&gt;" that sits on top of existing infrastructure. AP2 is designed to create a common, secure language for agents to conduct commerce, using role-based architecture and verifiable credentials.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i6f0zm7dqgtryjkgpf3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i6f0zm7dqgtryjkgpf3.png" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Payments and the Current Payment System
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=149s" rel="noopener noreferrer"&gt;02:29&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The current payment system was built for humans using trusted interfaces like browsers, not for autonomous agents, resulting in three main challenges for agents: &lt;strong&gt;authorization, agent error, and accountability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvok8osf319hbwwru4p8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvok8osf319hbwwru4p8r.png" width="800" height="566"&gt;&lt;/a&gt;&lt;br&gt;
The &lt;strong&gt;Agent Payment Protocol&lt;/strong&gt; addresses these challenges by helping agents communicate securely with merchants and payment partners. The Agent Payment Protocol is available today as an extension for the &lt;a href="https://a2a-protocol.org/" rel="noopener noreferrer"&gt;A2A (Agent2Agent) protocol&lt;/a&gt; and relies on agents using the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive into the Agent Payment Protocol
&lt;/h2&gt;

&lt;p&gt;Learn more about how this protocol works, including concepts and flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Role-Based Ecosystem
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=273s" rel="noopener noreferrer"&gt;04:33&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The protocol is built on a "separation of concerns." Your agent doesn't have to do everything. There are specialized roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Shopping Agent&lt;/strong&gt;: The AI agent you build, great at finding products.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merchant Endpoint&lt;/strong&gt;: The seller's API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential Provider&lt;/strong&gt;: A secure digital wallet (like PayPal, Google Pay, etc.) that manages payment details.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merchant Payment Processor&lt;/strong&gt;: The entity that constructs the final authorization message for the payment networks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0snpazgnllpzauxu1di0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0snpazgnllpzauxu1di0.png" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;_Critical&lt;/strong&gt;: Your shopping agent never touches the raw credit card number. It doesn't need to be PCI compliant because it delegates the payment to the specialized, secure providers._&lt;/p&gt;

&lt;h3&gt;
  
  
  Verifiable Credentials (VCs)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=375s" rel="noopener noreferrer"&gt;06:15&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The "handshakes" between these roles in the Agent Payment Protocol ecosystem are secured by Verifiable Credentials (VCs). Think of credentials as protocolized, cryptographically signed digital receipts that prove what was agreed upon.&lt;/p&gt;

&lt;p&gt;There are three types of verifiable credentials:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cart Mandate&lt;/strong&gt;: For "human-present" scenarios. The user reviews a final cart and cryptographically signs it as proof of approval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent Mandate&lt;/strong&gt;: For "human-not-present" scenarios (like the concert ticket example). The user signs an intent (e.g., "buy tickets under $200"), giving the agent authority to act within those guardrails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment Mandate&lt;/strong&gt;: Provides clear visibility to payment networks and banks that an AI agent was involved in the transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21rb8f2ntkuaeo3nhe0o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21rb8f2ntkuaeo3nhe0o.png" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Contractual Conversational Model
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=483s" rel="noopener noreferrer"&gt;08:03&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Agent Payment Protocol process creates a "Contractual Conversational Model," moving beyond simple API calls to a flow built on verifiable proof.&lt;/p&gt;

&lt;p&gt;To understand this flow, we'll walk through a &lt;strong&gt;human-present scenario&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Delegation&lt;/strong&gt;: You tell your agent, "Buy two concert tickets."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery &amp;amp; Negotiation&lt;/strong&gt;: The agent contacts the merchant's endpoint to prepare the cart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finalize Cart&lt;/strong&gt;: The agent reaches out to your Credential Provider (e.g., your digital wallet). You select the payment method. The agent only gets a reference (like the last 4 digits), never the full credential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization with Mandates&lt;/strong&gt;: The agent shows you the final, finalized cart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You cryptographically sign the Cart Mandate&lt;/strong&gt;. This is the non-repudiable proof, the "contract."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purchase&lt;/strong&gt;: The agent sends this signed mandate to the merchant. The merchant can now trust the purchase mandate is from you. The merchant's payment processor uses the mandate to securely get the payment token from the credential provider and complete the transaction.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd4ve1quoy9x6b3zraod.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd4ve1quoy9x6b3zraod.png" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This flow all hinges on trust. In the short term, this trust is built using &lt;strong&gt;manual allow lists&lt;/strong&gt; of approved agents and merchants. In the long term, the plan is to use open web standards like HTTPS and DNS ownership to verify identities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Q&amp;amp;A with Prateek Dudeja
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=787s" rel="noopener noreferrer"&gt;13:07&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With the concepts explained, the discussion moved to a Q&amp;amp;A with Prateek.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a New Protocol for Payments?
&lt;/h2&gt;

&lt;p&gt;_Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=810s" rel="noopener noreferrer"&gt;13:30&lt;/a&gt;]&lt;br&gt;
_&lt;br&gt;
Prateek gave a great analogy: HTTPS is a baseline protocol for browsing. Signing in requires stronger authentication. Making a &lt;strong&gt;payment&lt;/strong&gt; requires an even higher level of trust. AP2 provides that "payments-grade security" on top of baseline protocols like A2A and MCP, ensuring the transaction is high-trust and truly from a human.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Will Agents Find Trusted Partners?
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=882s" rel="noopener noreferrer"&gt;14:42&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the short term, agents will use "decentralized registries of trust" (or allow lists) to find merchants they can interact with. Prateek noted that all the roles (merchant, credential provider, etc.) already exist in the payments industry today. The only new role is the &lt;strong&gt;Shopping Agent&lt;/strong&gt; itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accountability: What Happens When Things Go Wrong?
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=963s" rel="noopener noreferrer"&gt;16:03&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the big question. What if your agent shows you &lt;em&gt;blue&lt;/em&gt; shoes, you wanted &lt;em&gt;teal&lt;/em&gt;, but you click "approve" anyway?&lt;/p&gt;

&lt;p&gt;Prateek explained that the signed &lt;strong&gt;Cart Mandate&lt;/strong&gt; solves this. Because you biometrically signed a tamper-proof credential showing the &lt;em&gt;blue&lt;/em&gt; shoes, the responsibility is on you. The merchant has cryptographic evidence that you saw and approved the exact product. This protects merchants from fraudulent chargebacks and users from unauthorized agent actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo: Reference Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1084s" rel="noopener noreferrer"&gt;18:04&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Prateek walked through a demo showing the human-present flow. It showed the user prompting the agent, the agent discovering products, and then the &lt;strong&gt;Credential Provider (PayPal)&lt;/strong&gt; getting involved. The user selected their shipping and payment info &lt;em&gt;from PayPal&lt;/em&gt;, and the agent only saw a reference. The user then signed the Cart Mandate, and the purchase was completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compatibility and Getting Started
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1183s" rel="noopener noreferrer"&gt;19:43&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A key question was - Is this compatible with frameworks like LangGraph or CrewAI? &lt;strong&gt;Yes&lt;/strong&gt;. Prateek confirmed the protocol is compatible with any framework. As long as your agent can communicate over A2A or MCP, you can use AP2.&lt;/p&gt;

&lt;p&gt;To get started, Prateek directed developers to the &lt;a href="https://github.com/google-agentic-commerce/AP2" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;. The first step is to see which role you want to play (merchant, credentials provider, etc.) and explore the sample code for that role.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future: Dynamic Negotiation
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Timestamp: [&lt;a href="https://www.youtube.com/watch?v=T1MtWnEYXM0&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1273s" rel="noopener noreferrer"&gt;21:13&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Looking ahead, Prateek shared an exciting vision for "dynamic negotiation." Imagine telling your agent: "I want that red dress that's out of stock. I need it by tomorrow... and I'm willing to pay 30% more".&lt;/p&gt;

&lt;p&gt;A merchant's agent could see this "intent" and, if the dress becomes available, automatically complete the sale. What was a lost sale for the merchant becomes a completed order at a markup, and the user gets the exact item they desperately wanted. &lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;This conversation made it clear that building a secure payment infrastructure is a foundational step toward creating agents that can perform truly useful tasks in the real world. We're moving from a simple, programmatic web to a conversational, contractual one, and this protocol provides the framework for it.&lt;/p&gt;

&lt;p&gt;We encourage you to check out the &lt;a href="https://github.com/google-agentic-commerce/AP2" rel="noopener noreferrer"&gt;Agent Payment Protocol GitHub repo&lt;/a&gt;, think about which role you could play in this new ecosystem, and start building today!&lt;/p&gt;

&lt;h4&gt;
  
  
  Connect with us
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Shir Meir Lador → &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ivan Nardini → &lt;a href="https://www.linkedin.com/in/ivan-nardini/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/ivnardini" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prateek Dudeja → &lt;a href="https://www.linkedin.com/in/prateek-dudeja/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>security</category>
      <category>ai</category>
      <category>ecommerce</category>
    </item>
    <item>
      <title>The Agent Factory podcast: 5 Episodes to Kickstart Your Journey to Production AI</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 25 Nov 2025 21:22:16 +0000</pubDate>
      <link>https://forem.com/googleai/the-agent-factory-podcast-5-episodes-to-kickstart-your-journey-to-production-ai-35ml</link>
      <guid>https://forem.com/googleai/the-agent-factory-podcast-5-episodes-to-kickstart-your-journey-to-production-ai-35ml</guid>
      <description>&lt;p&gt;We are so proud to announce that a project we're incredibly passionate about has grown into a full-blown resource for developers: The Agent Factory video podcast.&lt;/p&gt;

&lt;p&gt;We started this show with a simple mission: to have the conversations developers need to be having about AI agents development. We wanted to move past the hype and focus on what really matters—building production-ready AI agents.&lt;/p&gt;

&lt;p&gt;Fast forward to today, and we have &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;14 episodes&lt;/a&gt; published covering everything from architecture patterns to end to end vibe coding of advanced AI applications. To celebrate, we’re sharing our first 5 foundational episodes with the Dev.to community. If you are just starting to build agents or looking to harden your existing systems, this is the perfect place to start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to Expect:&lt;/strong&gt;&lt;br&gt;
We pack every episode with three core segments designed for developers:&lt;/p&gt;

&lt;p&gt;🎙️ &lt;strong&gt;Agent Industry Pulse:&lt;/strong&gt; We filter the noise and bring you the latest news you actually need to know.&lt;/p&gt;

&lt;p&gt;🛠️ &lt;strong&gt;The Factory Floor:&lt;/strong&gt; A technical deep-dive where we get our hands dirty with code, architectures, and patterns.&lt;/p&gt;

&lt;p&gt;❓ &lt;strong&gt;Developer Q&amp;amp;A:&lt;/strong&gt; We answer real questions from the community to help us learn together.&lt;/p&gt;

&lt;p&gt;📺 &lt;strong&gt;The Starter Pack: Our First 5 Episodes&lt;/strong&gt;&lt;br&gt;
Here is the chronological journey to get you up to speed, starting from the very beginning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Agents, their frameworks and when to use them (ft. Julia Wiesinger)&lt;/strong&gt; &lt;br&gt;
We kicked things off by tackling the big questions: What exactly is an agent? How do you choose between frameworks like LangChain, CrewAI, or the Agent Development Kit (ADK)? We were joined by Julia Wiesinger from the ADK team to guide us through building for production. 

  &lt;iframe src="https://www.youtube.com/embed/aLYrV61rJG4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Multi-Agent Systems: Concepts &amp;amp; Patterns&lt;/strong&gt;&lt;br&gt;
Single agent or multi-agent? In this episode, we break down the architectural patterns that matter - from Supervisors to Swarms. We discuss exactly when you should transition from a single agent to a team of agents to handle complexity and improve reliability. 

  &lt;iframe src="https://www.youtube.com/embed/TGNScswE0kU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Building Custom Tools for Agents&lt;/strong&gt;&lt;br&gt;
Agents are only as good as the tools they can use. We dive into Model Context Protocol (MCP), function calling, and how to build secure, authenticated tools that let your agents interact with the real world safely. 

  &lt;iframe src="https://www.youtube.com/embed/NiLb5DK4_rU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Memory in Agents (ft. Kimberly Milam)&lt;/strong&gt;&lt;br&gt;
How do you stop your agent from acting like a goldfish? We chat with Kimberly Milam about implementing long-term memory, managing state, and the "Memory Bank" concept to create personalized experiences that persist across sessions. 

  &lt;iframe src="https://www.youtube.com/embed/2yW7aTfjo88"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Tackling the Hardest Questions (ft. Philipp Schmidt)&lt;/strong&gt;&lt;br&gt;
We sat down with Philipp Schmidt from Google DeepMind for a masterclass on the agent development workflow. We cover context engineering, evaluation strategies, and pro-tips for using the Gemini CLI to speed up your development cycle. 

  &lt;iframe src="https://www.youtube.com/embed/kPVZQ3ae7-8"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💬 Join the Conversation&lt;/strong&gt;&lt;br&gt;
We’re truly excited to continue building this community with you. Whether you're stuck on a specific bug or wondering about a new architecture, we want to hear from you.&lt;/p&gt;

&lt;p&gt;What are you struggling with right now? Drop your questions in the comments below with &lt;strong&gt;#TheAgentFactory&lt;/strong&gt;, and we might answer them in our next Q&amp;amp;A segment!&lt;/p&gt;

&lt;p&gt;➡️ &lt;strong&gt;Listen &amp;amp; Subscribe: &lt;a href="https://www.youtube.com/googlecloudplatform" rel="noopener noreferrer"&gt;Google Cloud Tech&lt;/a&gt; on YouTube&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>beginners</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
