<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shir Meir Lador</title>
    <description>The latest articles on Forem by Shir Meir Lador (@shirmeirlador).</description>
    <link>https://forem.com/shirmeirlador</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596246%2F7fba9a43-cbe3-4af2-adff-1871187ffbf8.jpeg</url>
      <title>Forem: Shir Meir Lador</title>
      <link>https://forem.com/shirmeirlador</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shirmeirlador"/>
    <language>en</language>
    <item>
      <title>Agent Factory Recap: How Gemma 4 Taught Itself Physics</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 14 May 2026 14:10:49 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-how-gemma-4-taught-itself-physics-17e6</link>
      <guid>https://forem.com/googleai/agent-factory-recap-how-gemma-4-taught-itself-physics-17e6</guid>
      <description>&lt;p&gt;In this episode of The Agent Factory, Vlad Kolesnikov and I sat down with Omar Sanseviero from the Developer Experience team at Google DeepMind. We explored the groundbreaking release of Gemma 4: a new family of open models designed to bring high-level intelligence and agentic capabilities directly to consumer hardware and mobile devices. Since the launch last month, Gemma 4 had &lt;strong&gt;over 50 million downloads!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4 - What is it?
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is the latest generation of open models from Google DeepMind, built on the same foundational research as Gemini 3. The family is designed to deliver exceptional "intelligence per parameter" across a range of deployment scenarios, from mobile phones to powerful workstations. The Gemma 4 model family now spans three distinct architectures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small Sizes (E2B &amp;amp; E4B):&lt;/strong&gt; Optimized for ultra-mobile, edge, and browser deployment (such as Pixel or Chrome).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dense (31B):&lt;/strong&gt; A powerful 31-billion parameter model that provides server-grade performance for local execution on consumer GPUs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixture-of-Experts (26B MoE):&lt;/strong&gt; A highly efficient architecture designed for high-throughput tasks and advanced reasoning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the shift to an &lt;strong&gt;Apache 2 license&lt;/strong&gt;, these models provide developers and startups with the flexibility to build, modify, and commercialize applications while maintaining full control over their infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Omar Sanseviero on how Gemma 4 changes the landscape for agent developers
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=100s" rel="noopener noreferrer"&gt;1:40&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Omar highlighted that Gemma 4 brings "very high intelligence per parameter," making it possible to run agentic workflows entirely offline. We saw examples of multiple Gemma instances running locally to generate SVGs (&lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=113s" rel="noopener noreferrer"&gt;1:53&lt;/a&gt;) and an Android-based agent picking specific skills, like playing the piano, to complete tasks (&lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=165s" rel="noopener noreferrer"&gt;2:45&lt;/a&gt;). As Omar noted, "This means that you can run very powerful things with very little hardware overhead...even in the phone that you have in your pocket."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre8na48kuuq04m8asknf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre8na48kuuq04m8asknf.jpg" alt="Gemma 4 demo screenshot" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Factory Floor
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Building a Local Food Tour Agent
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=329s" rel="noopener noreferrer"&gt;5:29&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We showcased a food tour agent powered by Gemma 4 using the Agent Development Kit (ADK) and a Google Maps MCP server. We demonstrated how a local model can handle complex, multi-step reasoning tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent identified the best ramen spots in Seattle under a $30 budget.&lt;/li&gt;
&lt;li&gt;It verified that the locations were within walking distance of each other.&lt;/li&gt;
&lt;li&gt;It processed search results to provide specific tips on what to order and what to avoid.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Autonomous Python Code Execution
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=483s" rel="noopener noreferrer"&gt;8:03&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this demo, we pushed Gemma 4's coding capabilities to the limit by asking it to express itself through animation. Using a sandbox execution environment, the model performed the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrote Python code using the Matplotlib library.&lt;/li&gt;
&lt;li&gt;Attempted to build a physics engine to simulate a bouncing ball.&lt;/li&gt;
&lt;li&gt;Self-corrected when the initial execution environment lacked certain CPU features, finding an alternative path to successfully generate the animation.&lt;/li&gt;
&lt;li&gt;Demonstrated a deep understanding of real-world physics and gravity through code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Shift to Apache 2 Licensing
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=245s" rel="noopener noreferrer"&gt;4:05&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A major theme of the conversation was the community-driven decision to move Gemma 4 to an Apache 2 license. This change provides developers and startups with maximum flexibility to build, modify, and commercialize applications. Omar emphasized that this was a direct response to developer feedback, aiming to unlock a new wave of innovation in the open models ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Q&amp;amp;A
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Architectural Decisions and Mixture of Experts (MoE)
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1043s" rel="noopener noreferrer"&gt;17:23&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Omar explained the technical shifts that make Gemma 4 so efficient. For the first time, the Gemma family includes a Mixture of Experts (MoE) architecture, which optimizes for extremely low latency in production. Additionally, the smaller E2B and E4B models utilize per-layer embeddings to remain "cheap" to run on GPUs. For vision tasks, the model now supports variable aspect ratios, allowing it to understand images of various sizes more accurately than previous fixed-resolution versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparing Gemma to Gemini
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1191s" rel="noopener noreferrer"&gt;19:51&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When asked how Gemma stacks up against its larger sibling, Gemini, Omar clarified that they serve different purposes. While Gemini excels at massive-scale tasks and deep "world knowledge" due to its size, Gemma is the "best open model that can run on a single consumer GPU." It is specifically optimized for instruction following, coding, and agentic use cases where local deployment or fine-tuning is required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fine-Tuning for Specialized Industries
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=ST9mJuTnFqU&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1271s" rel="noopener noreferrer"&gt;21:10&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The conversation touched on the importance of "Sovereign AI" and privacy. Because Gemma is an open model, developers in regulated industries, like healthcare or finance, can &lt;a href="https://dev.to/googleai/fine-tuning-gemma-4-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-45ib"&gt;fine-tune the model on their private data&lt;/a&gt; and deploy it within their own air-gapped infrastructure. This gives developers full control over their data and the model's specialized expertise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Gemma 4 marks a turning point for agentic development, proving that you don't always need a massive cloud cluster to build something smart. Whether it's running a physics simulation on a laptop or a travel guide on a phone, the barrier to entry for high-performance AI has never been lower. We are entering an era where the "conductor" of the AI orchestra can be any developer with a single GPU and a great idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;Now that you've seen what Gemma 4 can do, it's time to start building. Check out the resources in our show notes, &lt;a href="https://goo.gle/3OinTFh" rel="noopener noreferrer"&gt;the food tour agent&lt;/a&gt;, &lt;a href="https://goo.gle/4dBDNEY" rel="noopener noreferrer"&gt;the coding agent&lt;/a&gt;, explore the &lt;a href="https://adk.dev/agents/models/google-gemma/" rel="noopener noreferrer"&gt;ADK support&lt;/a&gt;, and try running &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt; on your local machine or on &lt;a href="https://docs.cloud.google.com/run/docs/run-gemma-on-cloud-run" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;. We can't wait to see what agents you create!&lt;/p&gt;

&lt;p&gt;Watch more of The Agent Factory → &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1" rel="noopener noreferrer"&gt;Reinforcement learning &amp;amp; fine-tuning on TP...&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Subscribe to Google Cloud Tech → &lt;a href="https://goo.gle/GoogleCloudTech" rel="noopener noreferrer"&gt;https://goo.gle/GoogleCloudTech&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Shir Meir Lador → &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Vlad Kolesnikov → &lt;a href="http://www.linkedin.com/in/vkolesnikov/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/vladkol" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Omar Sanseviero → &lt;a href="https://www.linkedin.com/in/omarsanseviero/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/osanseviero" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>gemma</category>
      <category>agents</category>
    </item>
    <item>
      <title>Deploying a Multi-Agent System with Terraform and Cloud Run</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 07 May 2026 21:04:12 +0000</pubDate>
      <link>https://forem.com/googleai/deploying-a-multi-agent-system-with-terraform-and-cloud-run-2a9c</link>
      <guid>https://forem.com/googleai/deploying-a-multi-agent-system-with-terraform-and-cloud-run-2a9c</guid>
      <description>&lt;p&gt;In support of our mission to accelerate the developer journey on Google Cloud, we built Dev Signal: a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation.&lt;/p&gt;

&lt;p&gt;In the first three parts of this series, we laid the essential groundwork by establishing its core capabilities and local verification process:&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9"&gt;part 1&lt;/a&gt;, we standardize the agent's capabilities through the Model Context Protocol (MCP), connecting it to Reddit for trend discovery and Google Cloud Docs for technical grounding. In &lt;a href="https://dev.to/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15"&gt;part 2&lt;/a&gt;, we built a multi-agent architecture and integrated the Vertex AI memory bank to allow the system to learn and persist user preferences across different conversations. In &lt;a href="https://dev.to/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm"&gt;part 3&lt;/a&gt;, we verified the full end-to-end lifecycle locally using a dedicated test runner to ensure that research, content creation, and cloud-based memory retrieval were perfectly synchronized.&lt;/p&gt;

&lt;p&gt;If you'd like to dive straight into the code, you can clone the repository &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment to Cloud Run and the Path to Production
&lt;/h2&gt;

&lt;p&gt;To help you transition from this local prototype to a production service, this final part focuses on building the production backbone of your agent using the foundational deployment patterns provided by the &lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;Agent Starter Pack&lt;/a&gt;. We will implement the essential structural components required for monitoring, data integrity, and long-term state management in the cloud. You will learn to implement the application server and helper utilities needed for a production-ready deployment before provisioning secure, reproducible infrastructure with Terraform.&lt;/p&gt;

&lt;p&gt;While the Dockerfile packages your agent's code and its specialized dependencies, such as Node.js for the Reddit MCP tool, Terraform is used to build the platform it lives on. Terraform automates the creation of your Artifact Registry, least-privilege service accounts, and Secret Manager integrations to ensure your API keys remain protected.&lt;/p&gt;

&lt;p&gt;By the end of this part, you will have a standardized application framework deployed on Google Cloud Run and a roadmap for graduating your prototype through continuous evaluation, CI/CD and advanced observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Utilities and Server: Building the System's Body
&lt;/h2&gt;

&lt;p&gt;In this section, you implement the structural components required for monitoring and long-term state management in the cloud.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Application Server:&lt;/strong&gt; Initializing the FastAPI server and establishing a vital connection to the Vertex AI memory bank.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementing Telemetry:&lt;/strong&gt; Enabling 'Agent Traces' for visibility into internal reasoning.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Application Server
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;fast_api_app.py&lt;/code&gt; file serves as the vital entry point for your agent, transforming the core logic into a production FastAPI server that acts as the "body" of your system. When deploying to Cloud Run, this server is essential because it provides the necessary web interface to listen for incoming HTTP requests and dispatch them to the agent for processing. Beyond basic serving, its most critical role is establishing a connection to the Vertex AI memory bank by defining a &lt;code&gt;MEMORY_URI&lt;/code&gt;, which allows the ADK framework to persist and retrieve user preferences across different production sessions. Additionally, the application server initializes production-grade telemetry for real-time monitoring.&lt;/p&gt;

&lt;p&gt;Go back to the &lt;code&gt;dev_signal_agent folder.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste the following code in &lt;code&gt;dev_signal_agent/fast_api_app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.cli.fast_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_fast_api_app&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cloud_logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agent_engines&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dev_signal_agent.app_utils.env&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_environment&lt;/span&gt;

&lt;span class="c1"&gt;# --- Initialization &amp;amp; Secure Secret Retrieval ---
# We now unpack the SECRETS dictionary returned by our updated env.py
&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MODEL_LOC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SERVICE_LOC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_environment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cloud_logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Access sensitive credentials from the SECRETS dictionary
# These keys stay in memory and are NOT injected into os.environ
&lt;/span&gt;&lt;span class="n"&gt;REDDIT_CLIENT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;REDDIT_USER_AGENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_USER_AGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DK_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DK_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Configuration &amp;amp; Sessions ---
&lt;/span&gt;&lt;span class="n"&gt;AGENT_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abspath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="c1"&gt;# Non-sensitive configuration uses environment variables
&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI_ASSETS_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;USE_IN_MEMORY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_IN_MEMORY_SESSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- MEMORY BANK CONNECTION ---
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_memory_bank_uri&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;USE_IN_MEMORY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="c1"&gt;# We use 'dev_signal_agent' as the display name for the Vertex AI memory bank
&lt;/span&gt;    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AGENT_ENGINE_MEMORY_BANK_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev_signal_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_engines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;agent_engines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agentengine://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ae&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resource_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Connecting to Memory Bank: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;

&lt;span class="n"&gt;SESSION_URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MEMORY_URI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_memory_bank_uri&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# --- Initialize FastAPI with ADK ---
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_fast_api_app&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AGENT_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;artifact_service_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;BUCKET&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;allow_origins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ALLOW_ORIGINS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ALLOW_ORIGINS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_service_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SESSION_URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_service_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY_URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;--- Connects the Memory Bank
&lt;/span&gt;    &lt;span class="n"&gt;otel_to_cloud&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;--- Enables production telemetry
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;
    &lt;span class="c1"&gt;# Standard Cloud Run port is 8080
&lt;/span&gt;    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Implementing Telemetry
&lt;/h3&gt;

&lt;p&gt;In a production environment, visibility into your agent's reasoning is critical. We leverage the built-in observability features of the Google ADK by setting the &lt;code&gt;otel_to_cloud=True&lt;/code&gt; flag in our application server. This single parameter handles the majority of the instrumentation automatically, exporting "Agent Traces" directly to the Google Cloud Console. These traces provide a "visual waterfall" of the agent's operation, including individual agent thought processes, LLM invocations, and MCP tool calls.&lt;/p&gt;

&lt;h4&gt;
  
  
  Monitoring vs. Targeted Evaluation
&lt;/h4&gt;

&lt;p&gt;It is essential to understand that production tracing is subject to sampling to balance performance and cost. Because Cloud Run captures only a subset of requests, not every individual user interaction will be visible.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System Traces (Monitoring):&lt;/strong&gt; Used to analyze behavior "at large," such as identifying latency bottlenecks or system timeouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning Traces (Evaluation):&lt;/strong&gt; High-quality evaluation mandates targeted trace capture. This means calling the agent specifically for a test case where you know you will evaluate that particular request in full detail.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Viewing the Trace
&lt;/h4&gt;

&lt;p&gt;To see your traces, navigate to the Trace Explorer in the Google Cloud Console and filter for your service (e.g., &lt;code&gt;dev-signal&lt;/code&gt;). Clicking a specific Trace ID opens a Gantt chart that allows you to distinguish between cognitive reasoning failures (wrong decisions) and physical system issues (timeouts).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbe3brcww32j0igfr7zh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbe3brcww32j0igfr7zh.png" alt="Trace Explorer view" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For advanced configurations, refer to the following documentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/run/docs/trace#trace_sampling_rate?utm_campaign=CDR_0x91b1edb5_default_b485268863&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run Trace Sampling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/stackdriver/docs/instrumentation/ai-agent-adk#configure?utm_campaign=CDR_0x91b1edb5_default_b485268863&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Configuring ADK Telemetry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/trace/docs/collect-view-multimodal-prompts-responses?utm_campaign=CDR_0x91b1edb5_default_b485268863&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Multimodal Trace Capture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/" rel="noopener noreferrer"&gt;BigQuery Agent Analytics Integration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Infrastructure as Code: Provisioning Secure Cloud Resources
&lt;/h2&gt;

&lt;p&gt;We utilize the infrastructure-as-code patterns provided by the &lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;Agent Starter Pack&lt;/a&gt;'s security-first design. The starter pack builds the professional platform required to automate the creation of least-privilege service accounts and robust secret management in seconds.&lt;/p&gt;

&lt;p&gt;Using Terraform ensures that your entire Google Cloud environment - from IAM roles to Secret Manager versions - is defined in reproducible, secure code. We break our infrastructure into the following logical blocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resources &amp;amp; Variables&lt;/strong&gt;: Define the specific project, region, and sensitive API secrets used by the agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Infrastructure&lt;/strong&gt;: Enable essential APIs and provision a private Artifact Registry to host your agent's container images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity &amp;amp; Access Management (IAM)&lt;/strong&gt;: Configure specialized Service Accounts that strictly follow the Principle of Least Privilege to ensure your system remains secure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret Management&lt;/strong&gt;: Securely ingest API credentials into Google Secret Manager for protected runtime access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Run Configuration&lt;/strong&gt;: Define the container environment, resource limits, and automated secret injection for the final deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To begin provisioning, return to the root folder of your project (dev-signal) and create the necessary deployment directories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="nb"&gt;mkdir &lt;/span&gt;deployment
&lt;span class="nb"&gt;cd &lt;/span&gt;deployment
&lt;span class="nb"&gt;mkdir &lt;/span&gt;terraform
&lt;span class="nb"&gt;cd &lt;/span&gt;terraform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Terraform Resources and Variables
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;variables.tf&lt;/code&gt; file defines the configurable parameters for your deployment, allowing you to customize the infrastructure without altering the underlying logic. It includes variables for the &lt;code&gt;project_id&lt;/code&gt;, the deployment &lt;code&gt;region&lt;/code&gt; (defaulting to &lt;code&gt;us-central1&lt;/code&gt;), and the &lt;code&gt;service_name&lt;/code&gt; for your Cloud Run instance. Furthermore, it defines a &lt;code&gt;secrets&lt;/code&gt; map used to securely ingest sensitive API credentials—such as Reddit and Developer Knowledge keys—into Google Secret Manager for runtime access. This modular approach ensures your production environment remains reproducible, secure, and adaptable across different projects.&lt;/p&gt;

&lt;p&gt;Paste the following code into &lt;code&gt;deployment/terraform/variables.tf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"project_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The Google Cloud Project ID"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"region"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The Google Cloud region to deploy to"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"service_name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The name of the Cloud Run service"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev-signal"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"secrets"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A map of secret names and their values (e.g., REDDIT_CLIENT_ID, DK_API_KEY)"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"ai_assets_bucket"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The GCS bucket for storing AI assets"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Infrastructure Logic
&lt;/h3&gt;

&lt;p&gt;We define our infrastructure in logical blocks. Here is what each part does:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Enable APIs&lt;/strong&gt;: Ensures the project has the necessary services active (Cloud Run, Vertex AI, etc.). We use &lt;code&gt;disable_on_destroy = false&lt;/code&gt; to prevent accidental data loss if the Terraform is destroyed.&lt;/p&gt;

&lt;p&gt;Paste the following code into &lt;code&gt;deployment/terraform/main.tf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_project_service"&lt;/span&gt; &lt;span class="s2"&gt;"services"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s2"&gt;"run.googleapis.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"artifactregistry.googleapis.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"cloudbuild.googleapis.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"aiplatform.googleapis.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"secretmanager.googleapis.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"logging.googleapis.com"&lt;/span&gt;
  &lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="nx"&gt;service&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;
  &lt;span class="nx"&gt;disable_on_destroy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Artifact Registry&lt;/strong&gt;: Creates a private Docker registry to store our agent's container images.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_artifact_registry_repository"&lt;/span&gt; &lt;span class="s2"&gt;"repo"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;repository_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev-signal-repo"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Docker repository for Dev Signal Agent"&lt;/span&gt;
  &lt;span class="nx"&gt;format&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DOCKER"&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;google_project_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Service Account &amp;amp; IAM: Adhering to the Principle of Least Privilege&lt;/strong&gt; - This is a critical security step. In accordance with the Principle of Least Privilege, we avoid using the default compute service account and instead provision a dedicated user-managed service account (&lt;code&gt;dev-signal-sa&lt;/code&gt;). By designating this as the Cloud Run service identity, we can grant it only the minimum necessary permissions—specifically &lt;code&gt;roles/aiplatform.user&lt;/code&gt;, &lt;code&gt;roles/logging.logWriter&lt;/code&gt;, and &lt;code&gt;roles/storage.objectAdmin&lt;/code&gt;. This granular access control ensures that the agent has the exact permissions required to interact with Vertex AI and Cloud Storage without over-granting access to other sensitive cloud resources, significantly reducing the potential impact of a compromised account. Learn more &lt;a href="https://docs.cloud.google.com/iam/docs/best-practices-service-accounts?content_ref=because%20a%20service%20account%20is%20a%20principal%20you%20must%20limit%20its%20privileges%20to%20reduce%20the%20potential%20harm%20that%20can%20be%20done%20by%20a%20compromised%20service%20account&amp;amp;utm_campaign=CDR_0x91b1edb5_default_b485268863&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;best practices for using service accounts securely&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_service_account"&lt;/span&gt; &lt;span class="s2"&gt;"agent_sa"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;account_id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.service_name}-sa"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Dev Signal Agent Service Account"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Secret Management&lt;/strong&gt;: This handles your API keys securely. It creates secrets in Google Secret Manager and gives the agent's Service Account permission to access them at runtime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_secret_manager_secret"&lt;/span&gt; &lt;span class="s2"&gt;"agent_secrets"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="nx"&gt;secret_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;
  &lt;span class="nx"&gt;replication&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;auto&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;google_project_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_secret_manager_secret_version"&lt;/span&gt; &lt;span class="s2"&gt;"agent_secrets_version"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="nx"&gt;secret&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_secret_manager_secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent_secrets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;secret_data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_secret_manager_secret_iam_member"&lt;/span&gt; &lt;span class="s2"&gt;"secret_accessor"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="nx"&gt;secret_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_secret_manager_secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent_secrets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"roles/secretmanager.secretAccessor"&lt;/span&gt;
  &lt;span class="nx"&gt;member&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"serviceAccount:${google_service_account.agent_sa.email}"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Cloud Run Configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Best Practice:&lt;/strong&gt; To satisfy production security standards, our &lt;code&gt;main.tf&lt;/code&gt; grants the Service Account the &lt;code&gt;secretmanager.secretAccessor&lt;/code&gt; role. Our Python application then uses the &lt;a href="https://docs.cloud.google.com/secret-manager/docs/best-practices#coding-practices" rel="noopener noreferrer"&gt;Secret Manager SDK&lt;/a&gt; to pull these credentials directly into local memory at runtime, ensuring they never touch the container's environment configuration&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 6. Cloud Run Service Deployment&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_cloud_run_v2_service"&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_name&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;
  &lt;span class="nx"&gt;ingress&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"INGRESS_TRAFFIC_ALL"&lt;/span&gt;

  &lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;service_account&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_service_account&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent_sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;

    &lt;span class="nx"&gt;containers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-docker.pkg.dev/cloudrun/container/hello"&lt;/span&gt; &lt;span class="c1"&gt;# Placeholder until first build&lt;/span&gt;

      &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_PROJECT"&lt;/span&gt;
        &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_LOCATION"&lt;/span&gt;
        &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"global"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_GENAI_USE_VERTEXAI"&lt;/span&gt;
        &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"True"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AI_ASSETS_BUCKET"&lt;/span&gt;
        &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai_assets_bucket&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;limits&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;cpu&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1"&lt;/span&gt;
          &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2Gi"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;traffic&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"&lt;/span&gt;
    &lt;span class="nx"&gt;percent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Provision the Infrastructure
&lt;/h3&gt;

&lt;p&gt;Before we can deploy our code, we need to provision the Google Cloud infrastructure we just defined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initialize Terraform&lt;/strong&gt;: This downloads the necessary provider plugins. Run this in &lt;code&gt;deployment/terraform&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create a Variables File&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;deployment/terraform/terraform.tfvars&lt;/code&gt; and update it with your project details and secrets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;       &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your-project-id"&lt;/span&gt;
&lt;span class="nx"&gt;region&lt;/span&gt;           &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1"&lt;/span&gt;
&lt;span class="nx"&gt;service_name&lt;/span&gt;     &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev-signal"&lt;/span&gt;
&lt;span class="nx"&gt;ai_assets_bucket&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your-bucket-name"&lt;/span&gt;
&lt;span class="nx"&gt;secrets&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;REDDIT_CLIENT_ID&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your_client_id"&lt;/span&gt;
  &lt;span class="nx"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your_client_secret"&lt;/span&gt;
  &lt;span class="nx"&gt;REDDIT_USER_AGENT&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your_user_agent"&lt;/span&gt;
  &lt;span class="nx"&gt;DK_API_KEY&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your_dk_api_key"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plan configuration&lt;/strong&gt;: This allows you to review the changes before they are applied. Run this in the &lt;code&gt;deployment/terraform&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform plan &lt;span class="nt"&gt;-out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;plan.tfplan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Apply Configuration&lt;/strong&gt;: Once you have reviewed the plan and confirmed it does what you want, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply plan.tfplan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deployment: Containerization and the Cloud Build Pipeline
&lt;/h2&gt;

&lt;p&gt;In this final stage of the build process, we package our agent's "body" and "brain" into a portable, production-ready container. This ensures that every component - from our Python logic to the Node.js environment required for the Reddit MCP tool - is bundled together with its exact dependencies.&lt;/p&gt;

&lt;p&gt;We utilize a &lt;strong&gt;Dockerfile&lt;/strong&gt; to define this environment and a &lt;strong&gt;Makefile&lt;/strong&gt; to orchestrate the deployment pipeline. When you trigger the deployment, &lt;a href="https://pantheon.corp.google.com/cloud-build/builds" rel="noopener noreferrer"&gt;Google Cloud Build&lt;/a&gt; takes your local source code, builds the container image according to the Dockerfile, and stores it in the private Artifact Registry created earlier by Terraform. Finally, the pipeline automatically updates your Cloud Run service to serve traffic using this fresh image, completing the journey from local code to a live, secure cloud workload.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev-signal/Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;

&lt;span class="c"&gt;# Install Node.js and npm for MCP tools (like reddit-mcp)&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://deb.nodesource.com/setup_20.x | bash - &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nodejs &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; reddit-mcp &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get clean &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nv"&gt;uv&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;0.8.13

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /code&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./pyproject.toml ./README.md ./uv.lock* ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./dev_signal_agent ./dev_signal_agent&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--frozen&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uv", "run", "uvicorn", "dev_signal_agent.fast_api_app:app", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Makefile&lt;/strong&gt; automates the build and deploys.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev-signal/Makefile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;shell gcloud config get-value project&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;REGION&lt;/span&gt;     &lt;span class="o"&gt;?=&lt;/span&gt; us-central1
&lt;span class="nv"&gt;IMAGE_REPO&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; dev-signal-repo
&lt;span class="nv"&gt;IMAGE&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;REGION&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nt"&gt;-docker&lt;/span&gt;.pkg.dev/&lt;span class="p"&gt;$(&lt;/span&gt;PROJECT_ID&lt;span class="p"&gt;)&lt;/span&gt;/&lt;span class="p"&gt;$(&lt;/span&gt;IMAGE_REPO&lt;span class="p"&gt;)&lt;/span&gt;/agent:latest

&lt;span class="c"&gt;# Deploy via Cloud Build &amp;amp; Container
&lt;/span&gt;&lt;span class="nl"&gt;docker-deploy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"? Building and deploying to &lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; via Cloud Build..."&lt;/span&gt;
    gcloud builds submit &lt;span class="nt"&gt;--tag&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;IMAGE&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;PROJECT_ID&lt;span class="p"&gt;)&lt;/span&gt; .
    gcloud run services update dev-signal &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--image&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;IMAGE&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;REGION&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--project&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;PROJECT_ID&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--labels&lt;/span&gt; dev-tutorial&lt;span class="o"&gt;=&lt;/span&gt;dev-signal-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploy Application
&lt;/h3&gt;

&lt;p&gt;Now that our infrastructure is ready, we can build and deploy the application code.&lt;/p&gt;

&lt;p&gt;Run the following command from the root of your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make docker-deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happens when you run this?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt;: Google Cloud Build takes your local code and the &lt;code&gt;Dockerfile&lt;/code&gt;, builds a container image, and stores it in the Artifact Registry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt;: It updates the Cloud Run service defined in Terraform to use this new image.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When the deployment completes, you should get a message like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Service [dev-signal] revision [dev-signal...] has been deployed and is serving 100 percent of traffic.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Service URL: https://dev-signal-...-.us-central1.run.app&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification: Accessing and Testing Your Deployed Agent
&lt;/h2&gt;

&lt;p&gt;Since production services are private by default, this section covers how to grant permissions and access the agent securely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managing IAM Permissions:&lt;/strong&gt; Granting the necessary &lt;code&gt;run.invoker&lt;/code&gt; role to authorized users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secure Access via Cloud Run Proxy:&lt;/strong&gt; Using the &lt;code&gt;gcloud&lt;/code&gt; proxy to interact with your live service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Granting User Permissions
&lt;/h3&gt;

&lt;p&gt;Before you can invoke the service, you must grant your Google account the &lt;code&gt;roles/run.invoker&lt;/code&gt; role for this specific service. Run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services add-iam-policy-binding dev-signal &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"user:&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value account&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/run.invoker"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Launch the Proxy
&lt;/h3&gt;

&lt;p&gt;Now, access your private service securely via the proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services proxy dev-signal &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;strong&gt;http://localhost:8080&lt;/strong&gt; to chat with your deployed agent! See a possible test scenario in &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/create-expert-content-local-testing-of-a-multi-agent-system-with-memory" rel="noopener noreferrer"&gt;part 3&lt;/a&gt; of the series.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Congratulations! You have successfully built &lt;strong&gt;Dev Signal&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we covered:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9"&gt;&lt;strong&gt;Tooling (MCP)&lt;/strong&gt;&lt;/a&gt;: You connected your agent to &lt;strong&gt;Reddit&lt;/strong&gt;, &lt;strong&gt;Google Docs&lt;/strong&gt;, and a &lt;strong&gt;Local Image Generator&lt;/strong&gt; using the Model Context Protocol.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15"&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/a&gt;: You implemented a &lt;strong&gt;Root Orchestrator&lt;/strong&gt; managing specialized agents (Scanner, Expert, Drafter).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm"&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/a&gt;: You integrated &lt;strong&gt;Vertex AI memory bank&lt;/strong&gt; to give your agent long-term persistence across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt;: You deployed the entire stack to &lt;strong&gt;Google Cloud Run&lt;/strong&gt; using &lt;strong&gt;Terraform&lt;/strong&gt; for secure, reproducible infrastructure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You now have a solid foundation for building sophisticated, stateful AI applications on Google Cloud.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>terraform</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Local Testing of a Multi-Agent System with Memory</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 07 May 2026 21:03:03 +0000</pubDate>
      <link>https://forem.com/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm</link>
      <guid>https://forem.com/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm</guid>
      <description>&lt;p&gt;In support of our mission to accelerate the developer journey on Google Cloud, we built Dev Signal: a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9"&gt;part 1&lt;/a&gt; and &lt;a href="https://dev.to/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15"&gt;part 2&lt;/a&gt; of this series, we established the essential groundwork by standardizing the core capabilities through the Model Context Protocol (MCP) and constructing a multi-agent architecture integrated with the Vertex AI memory bank to provide long-term intelligence and persistence. Now, we'll explore how to test your multi-agent system locally!&lt;/p&gt;

&lt;p&gt;If you'd like to dive straight into the code and explore it at your own pace, you can clone the repository &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the Agent Locally
&lt;/h2&gt;

&lt;p&gt;Before transitioning your agentic system to Google Cloud Run, it is essential to ensure that its specialized components work seamlessly together on your workstation. This testing phase allows you to validate trend discovery, technical grounding, and creative drafting within a local feedback loop, saving time and resources during the development process.&lt;/p&gt;

&lt;p&gt;In this section, you will configure your local secrets, implement environment-aware utilities, and use a dedicated test runner to verify that Dev Signal can correctly retrieve user preferences from the Vertex AI memory bank on the cloud. This local verification ensures that your agent's "brain" and "hands" are properly synchronized before moving to deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment Setup
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file in your project root. These variables are used for local development and will be replaced by Terraform/Secret Manager in production.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev-signal/.env&lt;/code&gt; and update with your own details.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: &lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; is set as &lt;code&gt;global&lt;/code&gt; because that is where &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; is supported. We will use &lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; for the model location.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Google Cloud Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=global
GOOGLE_CLOUD_REGION=us-central1
GOOGLE_GENAI_USE_VERTEXAI=True
AI_ASSETS_BUCKET=your_bucket_name

# Reddit API Credentials
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=my-agent/0.1

# Developer Knowledge API Key
DK_API_KEY=your_api_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Helper Utilities
&lt;/h2&gt;

&lt;p&gt;Create a new directory for your application utils:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;dev_signal_agent
&lt;span class="nb"&gt;mkdir &lt;/span&gt;app_utils
&lt;span class="nb"&gt;cd &lt;/span&gt;app_utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Configuration
&lt;/h3&gt;

&lt;p&gt;This module standardizes how the agent discovers the active Google Cloud Project and Region, ensuring a seamless transition between development environments. Using &lt;code&gt;load_dotenv()&lt;/code&gt;, the script first checks for local configurations before falling back to &lt;code&gt;google.auth.default()&lt;/code&gt; or environment variables to retrieve the Project ID. This automated approach ensures your agent is properly authenticated and grounded in the correct cloud context without requiring manual configuration changes.&lt;/p&gt;

&lt;p&gt;Beyond basic project discovery, the script provides a robust &lt;strong&gt;Secret Management&lt;/strong&gt; layer. It attempts to resolve sensitive credentials, such as Reddit API keys, first from the local environment (for rapid development) and then dynamically from the &lt;a href="https://docs.cloud.google.com/secret-manager/docs/reference/rest" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Secret Manager API&lt;/strong&gt;&lt;/a&gt; for production security. By returning these as a dictionary rather than injecting them into environment variables, the module maintains a clean security posture.&lt;/p&gt;

&lt;p&gt;The script further calibrates the environment by distinguishing between global and regional requirements for different AI services. It specifically assigns the "global" location for models to access cutting-edge preview features while designating a regional location, such as &lt;code&gt;us-central1&lt;/code&gt;, for infrastructure like the Vertex AI Agent Engine.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/app_utils/env.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.auth&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;secretmanager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_fetch_secrets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch secrets from Secret Manager and return them as a dictionary.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;secrets_to_fetch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_USER_AGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DK_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;fetched_secrets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# First, check local environment (for local development via .env)
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;secrets_to_fetch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fetched_secrets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;

    &lt;span class="c1"&gt;# If keys are missing (common in production), fetch from Secret Manager API
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fetched_secrets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secrets_to_fetch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;secretmanager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SecretManagerServiceClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;secret_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;secrets_to_fetch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;secret_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fetched_secrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/secrets/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;secret_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/versions/latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;access_secret_version&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                    &lt;span class="n"&gt;fetched_secrets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;secret_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTF-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Warning: Could not fetch &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;secret_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; from Secret Manager: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fetched_secrets&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;init_environment&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Consolidated environment discovery.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;model_location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;service_location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;secrets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service_location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;secrets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_fetch_secrets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service_location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Local Testing Script
&lt;/h2&gt;

&lt;p&gt;The Google ADK comes with a built-in Web UI that is excellent for visualizing agent logic and tool composition. &lt;/p&gt;

&lt;p&gt;You can launch it by running in the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run adk web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the default Web UI will not test the long-term memory integration described in this tutorial because it is not pre-connected to a Vertex AI memory session. By default, the generic UI often relies on in-memory services that do not persist data across sessions. Therefore, we use the dedicated &lt;code&gt;test_local.py&lt;/code&gt; script to explicitly initialize the &lt;code&gt;VertexAiMemoryBankService&lt;/code&gt;. This ensures that even in a local environment, your agent is communicating with the real cloud-based memory bank to validate preference persistence.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;test_local.py&lt;/code&gt; script:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connects to the real &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/overview" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI Agent Engine&lt;/strong&gt;&lt;/a&gt; in the cloud for memory storage.&lt;/li&gt;
&lt;li&gt;Uses an in-memory session service for local chat history (so you can wipe it easily).&lt;/li&gt;
&lt;li&gt;Runs a chat loop where you can talk to your agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Go back to the root folder &lt;code&gt;dev-signal&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ../..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste this code in &lt;code&gt;dev-signal/test_local.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.auth&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.runners&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.memory.vertex_ai_memory_bank_service&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VertexAiMemoryBankService&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemorySessionService&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agent_engines&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dev_signal_agent.agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;root_agent&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Setup Configuration
&lt;/span&gt;    &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Agent Engine (Memory) MUST use a regional endpoint
&lt;/span&gt;    &lt;span class="n"&gt;resource_location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-signal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Initializing Vertex AI in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resource_location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resource_location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Find the Agent Engine Resource for Memory
&lt;/span&gt;    &lt;span class="n"&gt;existing_agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_engines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing_agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;agent_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;existing_agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;agent_engine_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resource_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Using persistent Memory Bank from Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_engine_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Error: Agent Engine &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; not found. Please deploy with Terraform first.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Initialize Services
&lt;/span&gt;    &lt;span class="n"&gt;session_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemorySessionService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;memory_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VertexAiMemoryBankService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resource_location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent_engine_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_engine_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Create a Runner
&lt;/span&gt;    &lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-signal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memory_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_service&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 5. Run a Test Loop
&lt;/span&gt;    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local-tester&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- TEST SCENARIO ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1. Start a session, tell the agent your preference (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;write in rhymes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2. Type &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to start a FRESH session (local state wiped).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3. Ask for a blog post. The agent should retrieve your preference from the CLOUD memory.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;current_session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-signal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- Chat Session (ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;current_session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-signal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_session_id&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- Fresh Session Started (ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(Local history is empty, retrieval must come from Memory Bank)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent is thinking...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;new_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_function_calls&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_function_calls&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🛠️ Tool Call: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the Test
&lt;/h3&gt;

&lt;p&gt;First, ensure you have your Application Default Credentials set up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud auth application-default login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run test_local.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Test Scenario
&lt;/h2&gt;

&lt;p&gt;This scenario validates the full end-to-end lifecycle of the agent: from discovery and research to multimodal content creation and long-term memory retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Teaching &amp;amp; Multimodal Creation (Session 1)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Goal: Establish technical context and set a specific stylistic preference.&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Discovery
&lt;/h4&gt;

&lt;p&gt;Ask the agent to find trending Cloud Run topics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;"Find high-engagement questions about AI agents on Cloud Run from the last 21 days."&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyh7wc97yzgc413xhwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyh7wc97yzgc413xhwi.png" alt="Test 1 - Discovery" width="800" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs895zdl1k3z309q3ya5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs895zdl1k3z309q3ya5.png" alt="Test 2 - Discovery Results" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Research
&lt;/h4&gt;

&lt;p&gt;Instruct the agent to perform a deep dive on a specific result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;"Use the GCP Expert to research topic #1."&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3tdx62hxuv8lsumw07k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3tdx62hxuv8lsumw07k.png" alt="Test 3 - Research" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Personalization
&lt;/h4&gt;

&lt;p&gt;Request a blog post and explicitly set your style preference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: &lt;code&gt;"Draft a blog post based on this research. From now on, I want all my technical blogs written in the style of a 90s Rap Song."&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febf7ljsso38maqnetfxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febf7ljsso38maqnetfxt.png" alt="Test 4 - Personalization" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Image Generation
&lt;/h4&gt;

&lt;p&gt;Ask the agent to generate an image that demonstrates the main ideas in the blog using the Nano Banana Pro tool. The image will be saved to your bucket in Google Cloud and you should get the path to see it, which will look like: &lt;code&gt;https://storage.mtls.cloud.google.com/...&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjczvc5cwzymc1qjyuou2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjczvc5cwzymc1qjyuou2.png" alt="Token Optimization / Image Generation" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Long-Term Memory Recall (Session 2)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Goal: Verify the agent recalls preferences across a completely fresh session.&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Type &lt;code&gt;new&lt;/code&gt; in the console to wipe local session history and start a fresh state.&lt;/li&gt;
&lt;li&gt;Retrieval: Inquire about your stored preferences to test the Vertex AI memory bank.&lt;/li&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Input&lt;/em&gt;: &lt;code&gt;"What are my current topics of interest and what is my preferred blogging style?"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;li&gt;Verification: Confirm the agent successfully retrieves your "AI Agents on Cloud Run" interest and "Rap" style from the cloud.&lt;/li&gt;

&lt;/ol&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsymbvlaad5owzrjc537x.png" alt="Test 5 - Memory Recall" width="800" height="269"&gt;

&lt;p&gt;&lt;strong&gt;Final Test&lt;/strong&gt;: Ask for a new blog on a different topic (e.g., "GKE Autopilot") and ensure it is automatically written as a rap song without being prompted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this part of our series we focused on verifying the agent's functionality in a local environment before proceeding to cloud deployment. By configuring local secrets and utilizing environment-aware utilities, we used a dedicated test runner to confirm that the core reasoning and tool logic are properly integrated. We successfully validated the full lifecycle: from Reddit discovery to expert content creation, confirming that the agent correctly retrieves preferences from the cloud-based Vertex AI memory bank even in completely fresh sessions.&lt;/p&gt;

&lt;p&gt;Ready to run the test scenario yourself? Clone the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and try the &lt;code&gt;test_local.py&lt;/code&gt; script to see 'Dev Signal' retrieve your preferences from the Vertex AI memory bank in real-time. For a deeper dive into the underlying mechanics of memory orchestration, check out this &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/quickstart-adk" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/googleai/deploying-a-multi-agent-system-with-terraform-and-cloud-run-2a9c"&gt;In the final part of this series,&lt;/a&gt; we will transition our prototype into a production service on Google Cloud Run using Terraform for secure infrastructure, and explore the roadmap to production excellence through continuous evaluation and security.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;Remigiusz Samborski&lt;/a&gt; for the helpful review and feedback on this article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For more content like this, follow me on &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://x.com/shirmeir86" rel="noopener noreferrer"&gt;X&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>agents</category>
      <category>python</category>
    </item>
    <item>
      <title>Architect A Personalized Multi-Agent System with Long-Term Memory</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 07 May 2026 21:01:06 +0000</pubDate>
      <link>https://forem.com/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15</link>
      <guid>https://forem.com/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15</guid>
      <description>&lt;p&gt;In support of our mission to accelerate the developer journey on Google Cloud, we built &lt;strong&gt;Dev Signal&lt;/strong&gt; — a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9"&gt;first part&lt;/a&gt; of this series for the &lt;strong&gt;Dev Signal&lt;/strong&gt;, we laid the essential groundwork for this system by establishing a project environment and equipping core capabilities through the Model Context Protocol (MCP). We standardized our external integrations, connecting to Reddit for trend discovery, Google Cloud Docs for technical grounding, and building a custom Nano Banana Pro MCP server for multimodal image generation. If you missed &lt;a href="https://dev.to/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9"&gt;Part 1&lt;/a&gt; or want to explore the code directly, you can find the complete project implementation in our &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now, in Part 2, we focus on building the multi-agent architecture and integrating the &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview" rel="noopener noreferrer"&gt;Vertex AI memory bank&lt;/a&gt; to personalize these capabilities. We will implement a Root Orchestrator that manages three specialist agents: the Reddit Scanner, GCP Expert, and Blog Drafter, to provide a seamless flow from trend discovery to expert content creation. We will also integrate a long-term memory layer that enables the agent to learn from your feedback and persist your stylistic preferences across different conversations. This ensures that Dev Signal doesn't just process data, but actually learns to match your professional voice over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure and Model Setup
&lt;/h2&gt;

&lt;p&gt;First, we initialize the environment and the shared Gemini model.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.apps&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;App&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Gemini&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;load_memory_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;preload_memory_tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.tool_context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dev_signal_agent.app_utils.env&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_environment&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dev_signal_agent.tools.mcp_config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;get_reddit_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_dk_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_nano_banana_mcp_toolset&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MODEL_LOC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SERVICE_LOC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SECRETS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_environment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;shared_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_LOC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retry_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpRetryOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Memory Ingestion Logic
&lt;/h2&gt;

&lt;p&gt;We want Dev Signal to do more than just follow instructions — we want it to learn from you. By capturing your preferences, such as specific technical interests on Reddit or a preferred blogging style, the agent can personalize its output for future use. To achieve this, we use the &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI memory bank&lt;/strong&gt;&lt;/a&gt; to persist session history across different conversations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-term Memory
&lt;/h3&gt;

&lt;p&gt;We automate this through the &lt;code&gt;save_session_to_memory_callback&lt;/code&gt; function. This callback is configured to run automatically after every turn, ensuring that session details are captured and stored in the memory bank without manual intervention.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Managed Memory Works:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion&lt;/strong&gt;: The &lt;code&gt;save_session_to_memory_callback&lt;/code&gt; sends the conversation data to Vertex AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt;: Vertex AI converts the text into numerical vectors (embeddings) that capture the semantic meaning of your preferences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: These vectors are stored in a managed index, enabling the agent to perform semantic searches and retrieve relevant history in future sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: The agent recalls this history using built-in ADK tools. The PreloadMemoryTool proactively brings in context at the start of an interaction, while the LoadMemoryTool allows the agent to fetch specific memories on an as-needed basis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_session_to_memory_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Defensive callback to persist session history to the Vertex AI memory bank.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callback_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Check connection to Memory Service
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_invocation_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_invocation_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Save the session!
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_invocation_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_session_to_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_invocation_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Short-term Memory
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;add_info_to_state&lt;/code&gt; function serves as the agent's short-term working memory, allowing the &lt;code&gt;gcp_expert&lt;/code&gt; to reliably hand off its detailed findings to the &lt;code&gt;blog_drafter&lt;/code&gt; within the same session. This working memory and the conversation transcript are managed by the Vertex AI Session Service to ensure that active context survives server restarts or transient failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The boundary between session-based state and long-term persistence&lt;/strong&gt; — It is important to note that while this service provides stability during an active interaction, this short-term memory does not persist between different sessions. Starting a fresh session ID effectively resets this working state, ensuring a clean slate for new tasks. Cross-session continuity, where the agent remembers your stylistic preferences or past feedback, is handled by the Vertex AI Memory Bank.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_info_to_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to state.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Specialist 1: Reddit Scanner (Discovery)
&lt;/h2&gt;

&lt;p&gt;The Reddit Scanner is our "Trend Spotter," it identifies high-engagement questions from the last 21 days (3 weeks) to ensure that all research findings remain both timely and relevant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Usage:&lt;/strong&gt; It leverages &lt;code&gt;load_memory&lt;/code&gt; to retrieve your past areas of interest and preferred topics from the Vertex AI memory bank. If relevant history exists, the agent prioritizes those specific topics in its search to provide a personalized discovery experience.&lt;/p&gt;

&lt;p&gt;Beyond simple retrieval, each sub-agent actively updates its memories by listening for new preferences and explicitly acknowledging them during the chat. This process captures relevant information in the session history, where an automated callback then persists it to the long-term Vertex AI memory bank for future use.&lt;/p&gt;

&lt;p&gt;This memory management is supported by two distinct retrieval patterns within the Google Agent Development Kit (ADK). The first is the &lt;code&gt;PreloadMemoryTool&lt;/code&gt;, which proactively brings in historical context at the beginning of every interaction to ensure the agent is fully briefed before addressing the current request. The second is the &lt;code&gt;LoadMemoryTool&lt;/code&gt;, which the agent uses on an as-needed basis, calling upon it only when it decides that deeper past knowledge would be beneficial for the current step in the workflow.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Singleton toolsets
&lt;/span&gt;&lt;span class="n"&gt;reddit_mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_reddit_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_USER_AGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reddit_scanner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit_scanner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shared_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a Reddit research specialist. Your goal is to identify high-engagement questions
from the last 3 weeks on specific topics of interest, such as AI/agents on Cloud Run.

Follow these steps:
1. **MEMORY CHECK**: Use `load_memory` to retrieve the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s **past areas of interest** and **preferred topics**. Calibrate your search to align with these interests.
2. Use the Reddit MCP tools to search for relevant subreddits and posts.
3. Filter results for posts created within the last 21 days (3 weeks).
4. Analyze &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high-engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; based on upvote counts and the number of comments.
5. Recommend the most important and relevant questions for a technical audience.
6. **CRITICAL**: For each recommended question, provide a direct link to the original thread and a concise summary of the discussion.
7. **CAPTURE PREFERENCES**: Actively listen for user preferences, interests, or project details. Explicitly acknowledge them to ensure they are captured in the session history for future personalization.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reddit_mcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;load_memory_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LoadMemoryTool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
    &lt;span class="n"&gt;after_agent_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;save_session_to_memory_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Specialist 2: GCP Expert (Grounding)
&lt;/h2&gt;

&lt;p&gt;The GCP Expert is our "Technical Authority". It triangulates facts by synthesizing official documentation from the Google Cloud Developer Knowledge MCP Server, community sentiment from Reddit, and broader context from Google Search.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dk_mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_dk_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SECRETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DK_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;search_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shared_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execute Google Searches and return raw, structured results (Title, Link, Snippet).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;gcp_expert&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gcp_expert&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shared_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a Google Cloud Platform (GCP) documentation expert.
Your goal is to provide accurate, detailed, and cited answers to technical questions by synthesizing official documentation with community insights.

For EVERY technical question, you MUST perform a comprehensive research sweep using ALL available tools:
1. **Official Docs (Grounding)**: Use DeveloperKnowledge MCP (`search_documents`) to find the definitive technical facts.
2. **Social Media Research (Reddit)**: Use the Reddit MCP to research the question on social media. This allows you to find real-world user discussions, common pain points, or alternative solutions that might not be in official documentation.
3. **Broader Context (Web/Social)**: Use the `search_agent` tool to find recent technical blogs, social media discussions, or tutorials.

Synthesize your answer:
- Start with the official answer based on GCP docs.
- Add &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Social Media Insights&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Common Issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; sections derived from Reddit and Web Search findings.
- **CRITICAL**: After providing your answer, you MUST use the `add_info_to_state` tool to save your full technical response under the key: `technical_research_findings`.
- Cite your sources specifically at the end of your response, providing **direct links** (URLs) to the official documentation, blog posts, and Reddit threads used.
- **CAPTURE PREFERENCES**: Actively listen for user preferences, interests, or project details. Explicitly acknowledge them to ensure they are captured in the session history for future personalization.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dk_mcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_agent&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;reddit_mcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_info_to_state&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;after_agent_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;save_session_to_memory_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Specialist 3: Blog Drafter (Creativity)
&lt;/h2&gt;

&lt;p&gt;The Blog Drafter is our Content Creator. It drafts the blog based on the expert's findings and offers to generate visuals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Usage:&lt;/strong&gt; It checks &lt;code&gt;load_memory&lt;/code&gt; for the user's &lt;strong&gt;preferred writing style&lt;/strong&gt; (e.g. "Witty", "Rap") stored in the &lt;strong&gt;Vertex AI memory bank&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;nano_mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_nano_banana_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;blog_drafter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blog_drafter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shared_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a professional technical blogger specializing in Google Cloud Platform.
Your goal is to draft high-quality blog posts based on technical research provided by the GDE expert and reliable documentation.

You have access to the research findings from the gcp_expert_agent here:
{{ technical_research_findings }}

Follow these steps:
1. **MEMORY CHECK**: Use `load_memory` to retrieve past blog posts, **areas of interest**, and user feedback on writing style. Adopt the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s preferred style and depth.
2. **REVIEW &amp;amp; GROUND**: Review the technical research findings provided above. **CRITICAL**: Use the `dk_mcp` (Developer Knowledge) tool to verify key facts, technical limitations, and API details. Ensure every claim in your blog is grounded in official documentation.
3. Draft a blog post that is engaging, accurate, and helpful for a technical audience.
4. Include code snippets or architectural diagrams if relevant.
5. Provide a &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Resources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; section with links to the official documentation used.
6. Ensure the tone is professional yet accessible, while adhering to any style preferences found in memory.
7. **VISUALS**: After presenting the drafted blog post, explicitly ask the user: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Would you like me to generate an infographic-style header image to illustrate these key points?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; If they agree, use the `generate_image` tool (Nano Banana).
8. **CAPTURE PREFERENCES**: Actively listen for user preferences, interests, or project details. Explicitly acknowledge them to ensure they are captured in the session history for future personalization.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dk_mcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;load_memory_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LoadMemoryTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;nano_mcp&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;after_agent_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;save_session_to_memory_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Root Orchestrator
&lt;/h2&gt;

&lt;p&gt;The root agent serves as the system's strategist, managing a team of specialist agents and orchestrating their actions based on the specific goals provided by the user. At the start of a conversation, the orchestrator retrieves memory to establish context by checking for the user's past areas of interest, preferred topics, or previous projects.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/agent.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;root_orchestrator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shared_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a technical content strategist. You manage three specialists:
1. reddit_scanner: Finds trending questions and high-engagement topics on Reddit.
2. gcp_expert: Provides technical answers based on official GCP documentation.
3. blog_drafter: Writes professional blog posts based on technical research.

Your responsibilities:
- **MEMORY CHECK**: At the start of a conversation, use `load_memory` to check if the user has specific **areas of interest**, preferred topics, or past projects. Tailor your suggestions accordingly.
- **CAPTURE PREFERENCES**: Actively listen for user preferences, interests, or project details. Explicitly acknowledge them to ensure they are captured in the session history for future personalization.
- If the user wants to find trending topics or questions from Reddit, delegate to reddit_scanner.
- If the user has a technical question or wants to research a specific theme, delegate to gcp_expert.
- **CRITICAL**: After the gcp_expert provides an answer, you MUST ask the user:
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Would you like me to draft a technical blog post based on this answer?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
- If the user agrees or asks to write a blog, delegate to blog_drafter.
- Be proactive in helping the user navigate from discovery (Reddit) to research (Docs) to content creation (Blog).
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;load_memory_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LoadMemoryTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;preload_memory_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PreloadMemoryTool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
    &lt;span class="n"&gt;after_agent_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;save_session_to_memory_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reddit_scanner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gcp_expert&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blog_drafter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev_signal_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this part of our series, we built multi-agent architecture and implemented a robust, dual-layered memory system. We established a Root Orchestrator, managing three specialist agents: a Reddit Scanner for trend discovery, a GCP Expert for technical grounding, and a Blog Drafter for creative content creation. &lt;/p&gt;

&lt;p&gt;By utilizing short-term state to pass information reliably between specialists and integrating the Vertex AI memory bank for long-term persistence, we've enabled the agent to learn from your feedback and remember specific writing styles across different conversations.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm"&gt;Part 3&lt;/a&gt;, we will show you how to test the agent locally to verify these components on your workstation, before transitioning to a full production deployment on Google Cloud Run in Part 4. Can't wait for part 3? The full implementation is already available for you to explore on &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To learn more about the underlying technology, explore the &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview" rel="noopener noreferrer"&gt;Vertex AI Memory Bank overview&lt;/a&gt; or dive into the official &lt;a href="https://docs.cloud.google.com/agent-builder/agent-development-kit/overview" rel="noopener noreferrer"&gt;ADK Documentation&lt;/a&gt; to see how to orchestrate complex multi-agent workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Special thanks to &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;Remigiusz Samborski&lt;/a&gt; for the helpful review and feedback on this article.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For more content like this, follow me on &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>agents</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Building Capabilities for a Multi-Agent System with Google ADK, MCP, and Cloud Run</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 07 May 2026 21:00:10 +0000</pubDate>
      <link>https://forem.com/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9</link>
      <guid>https://forem.com/googleai/building-capabilities-for-a-multi-agent-system-with-google-adk-mcp-and-cloud-run-ab9</guid>
      <description>&lt;p&gt;My team's mission is to accelerate the developer journey from writing code to running secure AI workloads on Google Cloud. To help developers succeed, we focus on identifying their most pressing questions and building demos that provide straightforward, easy-to-implement solutions.&lt;/p&gt;

&lt;p&gt;Recently, I was struck with inspiration when the new &lt;a href="https://developers.google.com/knowledge/mcp" rel="noopener noreferrer"&gt;Developer Knowledge MCP server&lt;/a&gt; was released. It led me to build &lt;strong&gt;Dev Signal&lt;/strong&gt;—a multi-agent system designed with &lt;a href="https://github.com/google/adk-python" rel="noopener noreferrer"&gt;Google Agent Development Kit (ADK)&lt;/a&gt;—to identify technical questions from Reddit, research them using official documentation, and draft detailed technical blogs. &lt;strong&gt;Dev Signal&lt;/strong&gt; also provides custom visuals using &lt;a href="https://blog.google/innovation-and-ai/products/nano-banana-pro/" rel="noopener noreferrer"&gt;Nano Banana Pro&lt;/a&gt;. I even integrated a long-term &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview" rel="noopener noreferrer"&gt;memory&lt;/a&gt; layer so the agent remembers my specific preferences and blogging style.&lt;/p&gt;

&lt;p&gt;By connecting my coding assistant, &lt;a href="https://docs.cloud.google.com/gemini/docs/codeassist/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;, to the developer knowledge MCP server, I built and deployed this entire system to &lt;a href="https://cloud.google.com/run/docs" rel="noopener noreferrer"&gt;Google Cloud Run&lt;/a&gt; in just two days.&lt;/p&gt;

&lt;p&gt;Whether you want to learn how to architect a complex multi-agent system with long term memory, leverage local and remote MCP servers for tool standardization, or write detailed Terraform scripts for secure Cloud Run deployment, I'll show you how!&lt;/p&gt;

&lt;p&gt;If you'd rather dive straight into the code and explore it at your own pace, you can clone the repository &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/abZxJiXGrJs"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll learn
&lt;/h2&gt;

&lt;p&gt;In this four-part blog series, I'll walk you through the step-by-step process of how I brought this project to life. Each blog post captures the journey of building and deploying &lt;strong&gt;Dev Signal&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1: Tools for building agent capabilities&lt;/strong&gt; – You'll begin by setting up your project environment and equipping your agent with tools using the Model Context Protocol (MCP). You'll learn how to connect to Reddit for trend discovery, Google Cloud docs for technical grounding, and a custom Nano Banana Pro tool for image generation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15"&gt;&lt;strong&gt;Part 2: The Multi-Agent Architecture with long term memory&lt;/strong&gt;&lt;/a&gt; – You'll build the "brain" of the system by implementing a root orchestrator and a team of specialized agents. You'll also integrate the &lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview" rel="noopener noreferrer"&gt;Vertex AI memory bank&lt;/a&gt;, enabling the agent to learn and persist your preferences across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm"&gt;&lt;strong&gt;Part 3: Testing the agent Locally&lt;/strong&gt;&lt;/a&gt; – Before moving to the cloud, you'll synchronize the agent's components and verify its performance on your workstation. You'll use a dedicated test runner to simulate the full lifecycle of discovery, research, and multimodal creation, with a special focus on validating long-term memory persistence by connecting your local agent directly to the cloud-based Vertex AI memory bank.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/googleai/deploying-a-multi-agent-system-with-terraform-and-cloud-run-2a9c"&gt;&lt;strong&gt;Part 4: Deployment to Cloud Run and the Path to Production&lt;/strong&gt;&lt;/a&gt; – Finally, you'll deploy your service on Google Cloud Run using Terraform for reproducible infrastructure. You'll also discuss the next steps required for a high quality secure production system.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting started with Dev Signal
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dev Signal&lt;/strong&gt; is an intelligent monitoring agent designed to filter noise and create value. &lt;strong&gt;Dev Signal&lt;/strong&gt; operates in the following ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt;: Scouts Reddit for high-engagement technical questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grounding&lt;/strong&gt;: Researches answers using official Google Cloud documentation to ensure accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creation&lt;/strong&gt;: Drafts professional technical blog posts based on its findings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Generation&lt;/strong&gt;: Generates custom infographic headers for those posts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-Term Memory&lt;/strong&gt;: Uses &lt;strong&gt;Vertex AI memory bank&lt;/strong&gt; to remember your feedback across different sessions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin, verify the following is installed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.12+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;uv&lt;/strong&gt; (Python package manager): &lt;code&gt;curl -LsSf https://astral.sh/uv/install.sh | sh&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/sdk/docs/install" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud SDK&lt;/strong&gt;&lt;/a&gt; (&lt;code&gt;gcloud&lt;/code&gt; CLI) installed and authenticated.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.hashicorp.com/terraform/install" rel="noopener noreferrer"&gt;&lt;strong&gt;Terraform&lt;/strong&gt;&lt;/a&gt; (for infrastructure as code).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.npmjs.com/downloading-and-installing-node-js-and-npm" rel="noopener noreferrer"&gt;&lt;strong&gt;Node.js &amp;amp; npm&lt;/strong&gt;&lt;/a&gt; (required for the Reddit MCP tool).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will also need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;a href="https://docs.cloud.google.com/resource-manager/docs/creating-managing-projects" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Project&lt;/strong&gt;&lt;/a&gt; with billing enabled.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/endpoints/docs/openapi/enable-api" rel="noopener noreferrer"&gt;&lt;strong&gt;APIs Enabled&lt;/strong&gt;&lt;/a&gt;: Vertex AI, Cloud Run, Secret Manager, Artifact Registry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit API Credentials&lt;/strong&gt; (Client ID, Secret) - You can get these from the &lt;a href="https://www.reddit.com/prefs/apps" rel="noopener noreferrer"&gt;Reddit Developer Portal&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer Knowledge API Key&lt;/strong&gt; (for Google Cloud docs search) - Instructions on how to get it are &lt;a href="https://developers.google.com/knowledge/mcp" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Setup
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Dev Signal&lt;/strong&gt; system was built by first running the &lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;Agent Starter Pack&lt;/a&gt;, following the automated architect workflow described in the &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks" rel="noopener noreferrer"&gt;Agent Factory episode&lt;/a&gt; by &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;Remigiusz Samborski&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/vkolesnikov/" rel="noopener noreferrer"&gt;Vlad Kolesnikov&lt;/a&gt;. This foundation provided the project's modular directory structure, which is used to separate concerns between Agent Logic, Server Code, Utilities, and Tools.&lt;/p&gt;

&lt;p&gt;The starter pack acts as a powerful starting point because it automates the creation of professional infrastructure, CI/CD pipelines, and observability tools in seconds. This allows you to focus entirely on the agent's unique intelligence while ensuring the underlying platform remains secure and scalable. By building on top of this generated boilerplate with AI assistance from &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemini-cli-open-source-ai-agent/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; and &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt;, the development process is highly accelerated.&lt;/p&gt;

&lt;p&gt;The agent starter pack high level architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F317n2pofuobvchzxorm0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F317n2pofuobvchzxorm0.png" alt="Agent Starter Pack Architecture" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Initialize the Project
&lt;/h3&gt;

&lt;p&gt;Create a new directory for your project and initialize it. We'll use &lt;code&gt;uv&lt;/code&gt;, which is an extremely fast Python package manager.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv init dev-signal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Folder Structure
&lt;/h3&gt;

&lt;p&gt;Our project will follow this structure. We will populate these files step-by-step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dev-signal/
├── dev_signal_agent/
│   ├── __init__.py
│   ├── agent.py                # Agent logic &amp;amp; orchestration
│   ├── fast_api_app.py         # Application server &amp;amp; memory connection
│   ├── app_utils/              # Env Config
│   │   └── env.py
│   └── tools/                  # External capabilities
│       ├── __init__.py
│       ├── mcp_config.py       # Tool configuration (Reddit, Docs)
│       └── nano_banana_mcp/    # Custom local image generation tool
│           ├── __init__.py
│           ├── main.py
│           ├── nano_banana_pro.py
│           ├── media_models.py
│           ├── storage_utils.py
│           └── requirements.txt
├── deployment/
│   └── terraform/              # Infrastructure as Code
├── .env                        # Local secrets (API keys)
├── Makefile                    # Shortcuts for building/deploying
├── Dockerfile                  # Container definition
└── pyproject.toml              # Dependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Define Dependencies
&lt;/h3&gt;

&lt;p&gt;Update your &lt;code&gt;pyproject.toml&lt;/code&gt; with the necessary dependencies. We use &lt;code&gt;google-adk&lt;/code&gt; for the agent framework and &lt;code&gt;google-genai&lt;/code&gt; for the model interaction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[project]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"dev-signal"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"A multi-agent system for monitoring and content creation."&lt;/span&gt;
&lt;span class="py"&gt;readme&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"README.md"&lt;/span&gt;
&lt;span class="py"&gt;requires-python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;"&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;3.14&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="py"&gt;dependencies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;"google-adk&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-genai&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"mcp&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"python-dotenv&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"fastapi&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.110&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"uvicorn&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.29&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-cloud-logging&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-cloud-aiplatform&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.38&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"fastmcp&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.13&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-cloud-storage&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.6&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-auth&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"google-cloud-secret-manager&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.26&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;uv sync&lt;/code&gt; to install everything.&lt;/p&gt;

&lt;p&gt;Create a new directory for the agent code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;dev_signal_agent
&lt;span class="nb"&gt;cd &lt;/span&gt;dev_signal_agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building the agent capabilities: MCP tools
&lt;/h2&gt;

&lt;p&gt;Our agent needs to interact with the outside world. We use the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;&lt;/a&gt; to standardize this. The &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is a universal standard for connecting AI agents to external data and tools. Instead of writing custom API wrappers, we use standard MCP servers. This allows us to connect to APIs (Reddit), Knowledge Bases (Google Cloud Docs), and even local scripts (Image Generation using Nano Banana Pro) using a common interface. Create a new directory for the agent tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;tools
&lt;span class="nb"&gt;cd &lt;/span&gt;tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tools Configuration
&lt;/h3&gt;

&lt;p&gt;We'll define our toolsets in &lt;code&gt;dev_signal_agent/tools/mcp_config.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This file defines the connection parameters for our three main tools.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reddit&lt;/strong&gt;: Connected via a local stdio subprocess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer Knowledge&lt;/strong&gt;: Connected via a remote HTTP endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nano Banana&lt;/strong&gt;: Connected via a local stdio subprocess (our custom Python script).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reddit Search (Discovery Tool)
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/Arindam200/reddit-mcp" rel="noopener noreferrer"&gt;Reddit MCP server&lt;/a&gt; acts as a bridge to the Reddit API, allowing your agent to discover trending posts and analyze engagement without you having to write complex API wrappers. To ensure portability, the code uses a "find or fetch" strategy: it first checks for a local installation and, if missing, automatically uses &lt;code&gt;npx&lt;/code&gt; to download and run the server on demand.&lt;/p&gt;

&lt;p&gt;Instead of a network connection, the agent launches the server as a local subprocess and communicates via standard input and output (stdio). Within the Google ADK, the &lt;code&gt;McpToolset&lt;/code&gt; class acts as a universal wrapper that standardizes these connections, enabling your agent to interact with various tools, from community resources to custom scripts like the Nano Banana image generator, using a common interface. By securely passing API credentials through environment variables, the system ensures these "plug-and-play" modules function as a seamless bridge between the AI and external platforms.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/mcp_config.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;McpToolset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.mcp_tool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamableHTTPConnectionParams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioConnectionParams&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_reddit_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Connects to the Reddit MCP server.
    This server runs as a local subprocess (stdio) and proxies requests to the Reddit API.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Check if 'reddit-mcp' is installed globally, otherwise use npx to run it
&lt;/span&gt;    &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;which&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--quiet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Inject secrets into the environment of the subprocess only
&lt;/span&gt;    &lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DOTENV_CONFIG_SILENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LANG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en_US.UTF-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client_id&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client_secret&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDDIT_USER_AGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;McpToolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioConnectionParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;  &lt;span class="c1"&gt;# Pass injected secrets directly to the subprocess
&lt;/span&gt;            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;120.0&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Google Cloud Docs (Knowledge Tool)
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://developers.google.com/knowledge/mcp" rel="noopener noreferrer"&gt;Developer Knowledge MCP server&lt;/a&gt; provides grounding for your agent by allowing it to search the entire corpus of official Google Cloud documentation. Unlike the local Reddit server, this is a managed service hosted by Google and accessed as a remote endpoint over the internet. It exposes specialized tools like &lt;code&gt;google_developer_documentation_search&lt;/code&gt; for semantic queries and &lt;code&gt;google_developer_documentation_fetch&lt;/code&gt; to retrieve full markdown content, ensuring that every technical claim the agent makes is supported by definitive, up-to-date facts.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; You can also connect your coding assistant tools such as &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemini-cli-open-source-ai-agent/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; or &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; to the developer knowledge MCP server to empower them with handy up to date Google Cloud documentation. I used it when writing this blog!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To connect, the agent uses the &lt;code&gt;McpToolset&lt;/code&gt; class with &lt;code&gt;StreamableHTTPConnectionParams&lt;/code&gt;, pointing to a web URL instead of launching a local process. It securely authenticates using a &lt;code&gt;DK_API_KEY&lt;/code&gt; (&lt;a href="https://developers.google.com/knowledge/mcp" rel="noopener noreferrer"&gt;create your api key&lt;/a&gt;) passed in the request headers, allowing the agent to perform a "comprehensive research sweep" across official docs, community sentiment, and broader web context through a single standardized interface.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/mcp_config.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_dk_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Connects to Developer Knowledge (Google Cloud Docs).
    This is a remote MCP server accessed via HTTP.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-Api-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fallback to os.environ for local testing if not passed via API
&lt;/span&gt;        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-Api-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DK_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;McpToolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StreamableHTTPConnectionParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://developerknowledge.googleapis.com/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Image Generator (Nano Banana MCP)
&lt;/h3&gt;

&lt;p&gt;While we've used external MCP servers for Reddit and documentation, we can also build our own custom MCP server to wrap specific Python logic. In this case, we are creating an image generation tool powered by Gemini 3 Pro Image (also known as Nano Banana Pro). This demonstrates that any Python function can be standardized into a tool that any agent can understand.&lt;/p&gt;

&lt;p&gt;How the image generation works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://gofastmcp.com/getting-started/welcome" rel="noopener noreferrer"&gt;&lt;strong&gt;FastMCP&lt;/strong&gt;&lt;/a&gt;: We use the &lt;code&gt;fastmcp&lt;/code&gt; library to drastically simplify server creation, allowing us to register Python functions as tools with just a few lines of code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Integration&lt;/strong&gt;: The server uses the Google GenAI SDK to call the &lt;code&gt;gemini-3-pro-image-preview&lt;/code&gt; model, which converts the agent's descriptive prompts into raw image bytes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCS Upload &amp;amp; Hosting:&lt;/strong&gt; Because agent interfaces typically require a URL to display images, the server automatically uploads the generated bytes to Google Cloud Storage (GCS) and returns a public link.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To connect this local tool, we use &lt;code&gt;StdioConnectionParams&lt;/code&gt; because the server runs as a local subprocess communicating via standard input and output. This transport method directly matches the &lt;code&gt;transport="stdio"&lt;/code&gt; configuration we will define in our server entrypoint, ensuring a seamless connection for your custom local scripts.&lt;/p&gt;

&lt;p&gt;The following code defines the MCP connection in &lt;code&gt;dev_signal_agent/tools/mcp_config.py&lt;/code&gt;. We use &lt;code&gt;uv run&lt;/code&gt; to ensure the server starts in an isolated environment with all its dependencies correctly installed.&lt;/p&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/mcp_config.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_nano_banana_mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Connects to our local &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Nano Banana&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; image generator.
    This demonstrates how to wrap a local Python script as an MCP tool.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev_signal_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nano_banana_mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI_ASSETS_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;McpToolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioConnectionParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI_ASSETS_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;600.0&lt;/span&gt;  &lt;span class="c1"&gt;# Image generation can take time
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Implementing the Nano Banana Pro Server Logic
&lt;/h3&gt;

&lt;p&gt;Now, we will implement the actual logic for this server. This implementation is based on the &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=2" rel="noopener noreferrer"&gt;Agent Factory&lt;/a&gt; demo &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/a9a5f64a3394a4b5ecc64061f397bd5ed82927ee/ai-ml/agent-factory-antigravity-nano-banana-pro/mcp" rel="noopener noreferrer"&gt;code&lt;/a&gt; by Remigiusz Samborski. While Remi's original code provides instructions for deploying the MCP server to Cloud Run, we will run it here as a local subprocess for faster development and testing.&lt;/p&gt;

&lt;p&gt;To get started, create the directory for our new server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; dev_signal_agent/tools/nano_banana_mcp
&lt;span class="nb"&gt;cd &lt;/span&gt;dev_signal_agent/tools/nano_banana_mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Server Entrypoint (&lt;code&gt;main.py&lt;/code&gt;)
&lt;/h4&gt;

&lt;p&gt;This file acts as the "brain" that initializes and starts the MCP server.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastMCP Initialization:&lt;/strong&gt; We use the &lt;code&gt;FastMCP&lt;/code&gt; library to create a server named "MediaGenerators" and register our &lt;code&gt;generate_image&lt;/code&gt; function as a tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safe Logging:&lt;/strong&gt; The &lt;code&gt;_initialize_console_logging&lt;/code&gt; function is critical. It forces all logs to &lt;code&gt;sys.stderr&lt;/code&gt;. This is because the MCP "stdio" transport uses &lt;code&gt;sys.stdout&lt;/code&gt; for communication between the agent and the tool; standard logs sent to &lt;code&gt;stdout&lt;/code&gt; would corrupt that protocol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt;: The &lt;code&gt;mcp.run(transport="stdio")&lt;/code&gt; line starts the server as a local subprocess, allowing it to listen for requests from your agent via standard input.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/nano_banana_mcp/main.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;nano_banana_pro&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;generate_image&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_initialize_console_logging&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Ensure logs go to STDERR so they don't break the MCP stdio protocol
&lt;/span&gt;    &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;min_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;force&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;generate_image&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MediaGenerators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;_initialize_console_logging&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stdio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Generation Logic (&lt;code&gt;nano_banana_pro.py&lt;/code&gt;)
&lt;/h4&gt;

&lt;p&gt;This is where the actual image generation happens using Gemini.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GenAI Client:&lt;/strong&gt; We initialize the &lt;code&gt;genai.Client()&lt;/code&gt; to interact with Google's generative models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Selection:&lt;/strong&gt; It specifically targets the &lt;code&gt;gemini-3-pro-image-preview&lt;/code&gt; model. We set the &lt;code&gt;response_modalities&lt;/code&gt; to "IMAGE" to tell the model we want pixels, not just text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robustness&lt;/strong&gt;: The code includes a &lt;code&gt;MAX_RETRIES&lt;/code&gt; loop (set to 5) to handle any transient generation errors, ensuring the agent has multiple attempts to get a valid image.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Byte Processing:&lt;/strong&gt; Once the model generates the image, it arrives as raw inline data. We extract these bytes and call our helper to move them to the cloud.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URI Conversion:&lt;/strong&gt; Finally, it replaces the internal &lt;code&gt;gs://&lt;/code&gt; path with a browser-accessible &lt;code&gt;https://&lt;/code&gt; URL so the user can actually see the image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/nano_banana_mcp/nano_banana_pro.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;media_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MediaAsset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;storage_utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;upload_data_to_gcs&lt;/span&gt;

&lt;span class="n"&gt;AUTHORIZED_URI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://storage.mtls.cloud.google.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;aspect_ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16:9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9:16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16:9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MediaAsset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generates an image using Gemini 3 Image model.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;genai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Starting image generation for prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MediaAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-pro-image-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IMAGE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;image_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ImageConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aspect_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;aspect_ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Upload the raw bytes to GCS
&lt;/span&gt;                    &lt;span class="n"&gt;gcs_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;upload_data_to_gcs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MediaAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gcs_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No image was generated.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Convert gs:// URI to an HTTP accessible URL if needed
&lt;/span&gt;        &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gs://&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AUTHORIZED_URI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image URL: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  GCS Upload Helper (&lt;code&gt;storage_utils.py&lt;/code&gt;)
&lt;/h4&gt;

&lt;p&gt;Since agents need a web link to display images, this utility handles the hosting on Google Cloud Storage (GCS).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Bucket Selection&lt;/strong&gt;: It looks for a bucket name in your environment variables, falling back from &lt;code&gt;AI_ASSETS_BUCKET&lt;/code&gt; to &lt;code&gt;LOGS_BUCKET_NAME&lt;/code&gt; to ensure it always has a place to save data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique Filenames:&lt;/strong&gt; We use an MD5 hash of the raw image data to create a unique filename. This prevents filename collisions and acts as a simple way to avoid duplicate uploads of the same image.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Upload:&lt;/strong&gt; The &lt;code&gt;blob.upload_from_string&lt;/code&gt; method pushes the raw image bytes directly to your GCS bucket.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/nano_banana_mcp/storage_utils.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud.storage&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Blob&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;storage_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ai_bucket_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI_ASSETS_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOGS_BUCKET_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ai_bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_bucket_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_data_to_gcs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;file_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;blob_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assets/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ai_bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;blob_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ai_bucket_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Data Model (&lt;code&gt;media_models.py&lt;/code&gt;)
&lt;/h4&gt;

&lt;p&gt;This file ensures that our data follows a strict structure (Schema).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured Output:&lt;/strong&gt; By using a Pydantic &lt;code&gt;BaseModel&lt;/code&gt;, we guarantee that the tool always returns a consistent JSON object containing a &lt;code&gt;uri&lt;/code&gt; (the link) and an optional &lt;code&gt;error&lt;/code&gt; message. This makes it much easier for the AI agent to understand and process the tool's result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/nano_banana_mcp/media_models.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MediaAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Tool Dependencies (&lt;code&gt;requirements.txt&lt;/code&gt;)
&lt;/h4&gt;

&lt;p&gt;While we use &lt;code&gt;uv&lt;/code&gt; to run our code, a &lt;code&gt;requirements.txt&lt;/code&gt; file remains essential because it defines the specific dependencies &lt;code&gt;uv&lt;/code&gt; needs to install for the Nano Banana server to function. This provides the necessary "ingredients" to set up the isolated environment before the server starts.&lt;/p&gt;

&lt;p&gt;This file lists the three core libraries required for this tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;google-cloud-storage&lt;/strong&gt;: Used for hosting the generated images on the cloud.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;google-genai&lt;/strong&gt;: Provides the logic for the Gemini 3 Pro image generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fastmcp&lt;/strong&gt;: The framework that turns our Python script into a standardized MCP tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste this code in &lt;code&gt;dev_signal_agent/tools/nano_banana_mcp/requirements.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;google-cloud-storage=&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.6&lt;/span&gt;&lt;span class="err"&gt;.*&lt;/span&gt;
&lt;span class="py"&gt;google-genai=&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.52&lt;/span&gt;&lt;span class="err"&gt;.*&lt;/span&gt;
&lt;span class="py"&gt;fastmcp=&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.13&lt;/span&gt;&lt;span class="err"&gt;.*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this first part of our series, we focused on establishing the agent's core capabilities by standardizing its external integrations through the Model Context Protocol (MCP). We initialized the project using &lt;code&gt;uv&lt;/code&gt; for high-speed dependency management and successfully configured three critical toolsets: Reddit for trend discovery, Google Cloud Docs for technical grounding, and a custom "Nano Banana" MCP server for multimodal image generation. By utilizing the Google ADK's &lt;code&gt;McpToolset&lt;/code&gt;, we've abstracted away complex API logic into simple, plug-and-play modules, ensuring that our tools share a common interface that decouples integration from intelligence.&lt;/p&gt;

&lt;p&gt;For a deeper look into our technical foundation, you can explore the &lt;a href="https://developers.google.com/knowledge/mcp" rel="noopener noreferrer"&gt;Developer Knowledge MCP server&lt;/a&gt; to learn more about knowledge grounding or visit the &lt;a href="https://github.com/google/adk-python" rel="noopener noreferrer"&gt;Google ADK GitHub repository&lt;/a&gt; to explore the framework's core capabilities.&lt;/p&gt;

&lt;p&gt;With our toolset fully configured and ready for action, we can now move to &lt;a href="https://dev.to/googleai/architect-a-personalized-multi-agent-system-with-long-term-memory-3o15"&gt;Part 2&lt;/a&gt;, where we will build the multi-agent architecture and integrate the Vertex AI memory bank to orchestrate these capabilities. You can also jump ahead to &lt;a href="https://dev.to/googleai/local-testing-of-a-multi-agent-system-with-memory-37mm"&gt;Part 3&lt;/a&gt;, where we will show you how to test the agent locally to verify these components on your workstation. If you’d like to dive ahead, you can explore the complete code for the entire series in our &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;Remigiusz Samborski&lt;/a&gt; for the helpful review and feedback on this article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For more content like this, follow Shir on &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://x.com/shirmeir86" rel="noopener noreferrer"&gt;X&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>Fine-Tuning Gemma 4 with Cloud Run Jobs: Serverless GPUs (NVIDIA RTX 6000 Pro) for pet breed classification 🐈🐕</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 28 Apr 2026 19:54:21 +0000</pubDate>
      <link>https://forem.com/googleai/fine-tuning-gemma-4-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-45ib</link>
      <guid>https://forem.com/googleai/fine-tuning-gemma-4-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-45ib</guid>
      <description>&lt;p&gt;Google has just announced &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/" rel="noopener noreferrer"&gt;the release of &lt;strong&gt;Gemma 4&lt;/strong&gt;&lt;/a&gt;! This new generation of open models brings significant advancements, particularly in reasoning capabilities and architectural efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bridging Reasoning and Precision with Gemma 4
&lt;/h2&gt;

&lt;p&gt;In my previous blog, I demonstrated how to &lt;a href="https://dev.to/googleai/fine-tuning-gemma-3-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-248b"&gt;fine-tune Gemma 3 27B on &lt;strong&gt;Cloud Run Jobs&lt;/strong&gt; using &lt;strong&gt;NVIDIA RTX PRO 6000 Blackwell Edition GPUs&lt;/strong&gt; for pet breed classification&lt;/a&gt;. With the release of Gemma 4, I couldn't wait to update my pipeline and see how the new model performs.&lt;/p&gt;

&lt;p&gt;In this follow-up post, I'll explain what makes Gemma 4 different, the benefits it brings, and exactly what file modifications and workarounds are needed to successfully fine-tune it using PEFT (LoRA) on Cloud Run. We'll cover everything from memory requirements and dynamic label masking to prompt structures for reasoning models. Whether you read the previous post or are new to this pipeline, this guide will provide a complete, working solution for Gemma 4.&lt;/p&gt;

&lt;p&gt;If you'd rather dive straight into the code and explore it at your own pace, you can clone the repository &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/finetune_gemma" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New in Gemma 4?
&lt;/h2&gt;

&lt;p&gt;Gemma 4 introduces groundbreaking improvements over Gemma 3, making it Google's most intelligent open model family to date:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0 License&lt;/strong&gt;: Gemma 4 is released under a commercially permissive Apache 2.0 license, providing full developer flexibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Highly Competitive Benchmarks&lt;/strong&gt;: The 31B model ranks as the #3 open model on the Arena AI text leaderboard, while the 26B MoE model ranks #6, outcompeting models 20x their size!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Reasoning &amp;amp; Agents&lt;/strong&gt;: Purpose-built for multi-step planning and deep logic. It features native support for function-calling, structured JSON output, and native system instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal &amp;amp; Long Context&lt;/strong&gt;: Natively processes images, video, and even audio (in edge models). It supports up to a 256K context window for larger models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versatile Architectures&lt;/strong&gt;: Includes a 26B Mixture of Experts (MoE) model that only activates 3.8B parameters during inference for fast response times.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of these changes, simply dropping Gemma 4 into a Gemma 3 fine-tuning script won't work out of the box. Here is a breakdown of what needed to change in the codebase to make it work.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPU Memory and Parameter Capacity
&lt;/h2&gt;

&lt;p&gt;With the availability of &lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads" rel="noopener noreferrer"&gt;&lt;strong&gt;NVIDIA RTX PRO 6000&lt;/strong&gt; GPUs&lt;/a&gt; on &lt;a href="https://docs.cloud.google.com/run/docs" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;, we now have access to &lt;strong&gt;96GB of VRAM&lt;/strong&gt;. This is a game-changer for hosting and fine-tuning large models.&lt;/p&gt;

&lt;p&gt;According to the formula discussed in my blog post on &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models/" rel="noopener noreferrer"&gt;Decoding high-bandwidth memory&lt;/a&gt;: &lt;em&gt;Total HBM ≈ (Model Size) + (Optimizer States) + (Gradients) + (Activations)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When using &lt;strong&gt;LoRA&lt;/strong&gt; (Low-Rank Adaptation), we freeze the base model weights and only train a small subset of parameters. This means the memory-hungry gradients and optimizer states are negligible for the base model. For &lt;strong&gt;Gemma 4 31B&lt;/strong&gt; loaded in 16-bit precision (bfloat16), the base model size is roughly &lt;em&gt;31 billion parameters × 2 bytes/parameter ≈ 62 GB.&lt;/em&gt; While this 62GB model fits comfortably within the &lt;strong&gt;96GB of VRAM&lt;/strong&gt; available on the RTX 6000 Pro, we can do even better!&lt;/p&gt;

&lt;p&gt;By applying &lt;strong&gt;4-bit quantization (QLoRA)&lt;/strong&gt; via the bitsandbytes library, we dramatically shrink this base memory footprint to roughly 18–20GB. This leaves an enormous amount of VRAM overhead exclusively dedicated to the high-memory activations required by multi-modal processing and long-context training batches, unlocking unparalleled serverless efficiency!&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Code Changes for Gemma 4 Migration
&lt;/h2&gt;

&lt;p&gt;If you are updating your own script or starting fresh, these are the critical adjustments made to the pipeline:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Multimodal Input Ordering &amp;amp; Integrated Instructions
&lt;/h3&gt;

&lt;p&gt;While Gemma 4 supports interleaved inputs and a &lt;a href="https://ai.google.dev/gemma/docs/core/model_card_4#mixture-of-experts_moe_model" rel="noopener noreferrer"&gt;native system role&lt;/a&gt;, we recommend providing the image data before the text as a stable convention and merging instructions into the user prompt for this pipeline. We found this 'single-turn' structure more effective for maintaining instruction-following precision and simplifying our custom masking logic.&lt;/p&gt;

&lt;p&gt;In the code below, the &lt;code&gt;{"type": "image"}&lt;/code&gt; entry acts as a placeholder that signals the processor to inject special image tokens into the chat template. The actual image tensors are then passed separately during the data collation step to ensure the multimodal architecture is adapted correctly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;full_user_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Identify the breed of the animal in this image.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Image must come first!
&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_user_content&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;caption&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Loading the Correct Multimodal Architecture
&lt;/h3&gt;

&lt;p&gt;Gemma 4 natively processes images, video, and even audio (in the E2B and E4B models), which changes how the model must be loaded. To correctly handle these diverse inputs, we explicitly use the &lt;code&gt;AutoModelForMultimodalLM&lt;/code&gt; class. While &lt;code&gt;AutoModelForImageTextToText&lt;/code&gt; remains a valid option for purely image-based tasks, the multimodal class is the more precise choice for the Gemma 4 architecture, ensuring it is ready to handle video and audio data natively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForMultimodalLM&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForMultimodalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Label Masking for Multimodal Data
&lt;/h3&gt;

&lt;p&gt;In Gemma 3, we could hardcode specific token IDs to find where the assistant's response started to mask the prompt. For Gemma 4, we initially tried tokenizing the text prompt separately to find its length, but hit a major snag.&lt;/p&gt;

&lt;p&gt;Gemma 4 is highly efficient with media: each image gets a &lt;a href="https://ai.google.dev/gemma/docs/capabilities/vision#variable-resolution" rel="noopener noreferrer"&gt;dynamic number of soft tokens&lt;/a&gt; exactly fitted to its content. While these image soft tokens are highly stable and pre-computable (their count does not change whether the image is alone or accompanied by text), standard tokenizers can still introduce slight boundary quirks when concatenating text and control tokens &lt;em&gt;after&lt;/em&gt; these media tokens. If you tokenize the prompt in isolation, the length might be slightly off compared to the fully assembled chat template, tanking the model's accuracy.&lt;/p&gt;

&lt;p&gt;To achieve the highest precision, we implemented a bulletproof backward-search collator. Instead of trying to calculate the prompt length, we search the full &lt;code&gt;_input_ids_&lt;/code&gt; array for the exact tokens of our breed name label. Once found, we step backwards to locate the &lt;code&gt;&amp;lt;|turn&amp;gt;&lt;/code&gt; control token that marks the start of the assistant's response, and mask everything before it. This mathematically guarantees the model is trained exactly on the required template structure and the label, without any masking misalignment.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Bypassing Custom Layers &amp;amp; Unlocking the Vision Tower
&lt;/h3&gt;

&lt;p&gt;This was the most critical breakthrough! The official Hugging Face implementation for Gemma 4 uses a custom neural network wrapper called &lt;code&gt;Gemma4ClippableLinear&lt;/code&gt; for its projection layers. This custom class wraps a standard &lt;code&gt;nn.Linear&lt;/code&gt; layer but adds specific logic to clip minimum and maximum activations (&lt;code&gt;input_min&lt;/code&gt;, &lt;code&gt;output_max&lt;/code&gt;, etc.) to stabilize training.&lt;/p&gt;

&lt;p&gt;When we tried to apply standard LoRA by targeting specific layer names like &lt;code&gt;q_proj&lt;/code&gt; or &lt;code&gt;v_proj&lt;/code&gt;, we hit two major issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Activation Clipping Bypass&lt;/strong&gt;: Standard PEFT/LoRA doesn't natively recognize &lt;code&gt;Gemma4ClippableLinear&lt;/code&gt;. If forced to attach to the inner &lt;code&gt;.linear&lt;/code&gt; weights, it bypasses the parent wrapper entirely. Without that crucial activation clipping during the forward pass, the model's activations become unstable, and the training loss explodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frozen Vision Tower&lt;/strong&gt;: Even if we fixed the text backbone, standard text-focused LoRA configurations often miss the vision tower's projection layers, leaving the model's "eyes" frozen during training.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The solution is to use the macro &lt;code&gt;target_modules="all-linear"&lt;/code&gt;. This tells the PEFT library to recursively scan the entire model tree. It safely identifies and wraps nested linear layers without breaking the outer &lt;code&gt;Gemma4ClippableLinear&lt;/code&gt; clipping logic. Crucially, it also ensures that every linear layer across &lt;strong&gt;both the language model and the vision tower&lt;/strong&gt; is adapted to your data, without sacrificing architectural stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Results
&lt;/h3&gt;

&lt;p&gt;By combining the multimodal architecture, bulletproof masking, and full-tower LoRA, we achieved a nice improvement in the model accuracy.&lt;/p&gt;

&lt;p&gt;Note that Gemma 4 baseline performance &lt;strong&gt;(89% accuracy) was significantly higher&lt;/strong&gt; than &lt;a href="https://dev.to/googleai/fine-tuning-gemma-3-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-248b"&gt;Gemma 3 Baseline performance &lt;strong&gt;(67% accuracy)&lt;/strong&gt;&lt;/a&gt; so in this case the accuracy improvement is more modest, but still significant.&lt;/p&gt;

&lt;h4&gt;
  
  
  Intermediate Results (700 Samples, ~50 minutes Run)
&lt;/h4&gt;

&lt;p&gt;Even with a small subset of 700 training images, we saw a nice boost over the baseline in less than one hour:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv5idqnzvq6inavt8ddo.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv5idqnzvq6inavt8ddo.webp" alt=" " width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Results on 700 training samples and 200 evaluation samples&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h4&gt;
  
  
  Final Results (Full Dataset, ~4.25 Hours Run)
&lt;/h4&gt;

&lt;p&gt;Running the full &lt;a href="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*UmCvZII_Mu1bvjrHL8WTOw.png" rel="noopener noreferrer"&gt;Oxford-IIIT Pet dataset&lt;/a&gt; (~4,000 training images and 3,669 evaluation images) yielded our peak performance (STOA for this dataset is 94% accuracy):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowa1irxg2lzaqix2586k.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowa1irxg2lzaqix2586k.webp" alt=" " width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Results on 4000 training samples and 3669 evaluation samples&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;In this run, we utilized a more aggressive LoRA configuration than typical text-only runs: a &lt;strong&gt;Rank 64&lt;/strong&gt; / &lt;strong&gt;Alpha 64&lt;/strong&gt; setup with a &lt;strong&gt;5e-5 learning rate&lt;/strong&gt;. This gave the model enough "surface area" to refine its visual features for the specific nuances of the pet dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Managing VRAM with QLoRA &amp;amp; Gradient Checkpointing
&lt;/h3&gt;

&lt;p&gt;While 96GB of VRAM on the RTX 6000 Pro is massive, training a 31B parameter model with LoRA still pushes the boundaries of a single GPU. To ensure absolute stability and prevent Out-Of-Memory (OOM) errors during the backward pass, our script implements a two-pronged optimization strategy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;QLoRA (4-bit Quantization):&lt;/strong&gt; Utilizing &lt;code&gt;BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")&lt;/code&gt; to drastically reduce the model's footprint when loaded on CUDA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient Checkpointing:&lt;/strong&gt; Specifically enabled for the 31B model, this trades a slight increase in compute time for a significant reduction in VRAM usage by recalculating activations instead of storing them all in memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Complete Fine-Tuning Workflow on Cloud Run
&lt;/h2&gt;

&lt;p&gt;Before you begin the fine-tuning process, ensure you have the following software and environment configurations in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Project&lt;/strong&gt; with billing enabled and APIs active (Cloud Run, Artifact Registry, Cloud Build, Secret Manager).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA RTX PRO 6000&lt;/strong&gt; availability in your region (e.g., &lt;code&gt;europe-west4&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face Token&lt;/strong&gt;: A valid token with access to the Gemma 4 model weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 0: Set Environment Variables
&lt;/h2&gt;

&lt;p&gt;Set the following environment variables to align with the steps below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;YOUR_PROJECT_ID]
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west4
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;YOUR_HF_TOKEN]
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"finetune-gemma-job-sa"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="nt"&gt;-gemma4-finetuning-eu&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AR_REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma4-finetuning-repo
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SECRET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;HF_TOKEN
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma4-finetune
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;JOB_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma4-finetuning-job
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1: Get the Code
&lt;/h2&gt;

&lt;p&gt;Whether you're running locally or on the cloud, you'll need the code. Clone the repository and navigate to the project directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/GoogleCloudPlatform/devrel-demos
&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-demos/ai-ml/finetune_gemma/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Test Locally Before Cloud Deployment
&lt;/h2&gt;

&lt;p&gt;Before spinning up massive GPUs in the cloud, it is always a best practice to verify your pipeline locally using a smaller model variant (like the 2B IT model) on a subset of the data.&lt;/p&gt;

&lt;p&gt;To run a local CPU test, first activate your virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, execute the script with a very small dataset to ensure the pipeline completes successfully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 finetune_and_evaluate.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-id&lt;/span&gt; google/gemma-4-e2b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--device&lt;/span&gt; cpu &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--train-size&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--eval-size&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gradient-accumulation-steps&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--num-epochs&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you verify that the training pipeline completes successfully, you are ready to scale up to Cloud Run!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Stage the Model in GCS
&lt;/h2&gt;

&lt;p&gt;To save startup time and avoid repetitive downloads from the internet during training, stage the model weights (e.g., &lt;code&gt;google/gemma-4-31b-it&lt;/code&gt;) in a GCS bucket located in the same region as your Cloud Run job. We provide a utility script within the repository to perform this transfer directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Navigate to the utility directory&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;hf-to-gcs
&lt;span class="c"&gt;# Execute the transfer script&lt;/span&gt;
python3 hf_to_gcs.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-id&lt;/span&gt; google/gemma-4-31b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hf-token&lt;/span&gt; &lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script ensures that the weights are stored in your project's bucket, enabling high-speed access via volume mounts when the Cloud Run job executes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Build the Container
&lt;/h2&gt;

&lt;p&gt;Use Cloud Build to package your script and dependencies into a container image compatible with CUDA 12.8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud builds submit &lt;span class="nt"&gt;--tag&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="nt"&gt;-docker&lt;/span&gt;.pkg.dev/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$AR_REPO&lt;/span&gt;/&lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt;:latest &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;[!TIP] You can track the real-time progress of your build in the Cloud Build console.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Create and Execute the Cloud Run Job
&lt;/h2&gt;

&lt;p&gt;Create the job with GPU support and volume mounts for the GCS bucket holding the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run &lt;span class="nb"&gt;jobs &lt;/span&gt;create gemma4-finetuning-job &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/gemma4-finetune &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-type&lt;/span&gt; nvidia-rtx-pro-6000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpu&lt;/span&gt; 30.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt; 120Gi &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--labels&lt;/span&gt; dev-tutorial&lt;span class="o"&gt;=&lt;/span&gt;finetune-gemma &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-volume&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-volume,type&lt;span class="o"&gt;=&lt;/span&gt;cloud-storage,bucket&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--add-volume-mount&lt;/span&gt; &lt;span class="nv"&gt;volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-volume,mount-path&lt;span class="o"&gt;=&lt;/span&gt;/mnt/gcs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--model-id"&lt;/span&gt;,&lt;span class="s2"&gt;"/mnt/gcs/google/gemma-4-31b-it/"&lt;/span&gt;,&lt;span class="s2"&gt;"--output-dir"&lt;/span&gt;,&lt;span class="s2"&gt;"/mnt/gcs/gemma4-finetuned"&lt;/span&gt;,&lt;span class="s2"&gt;"--train-size"&lt;/span&gt;,&lt;span class="s2"&gt;"700"&lt;/span&gt;,&lt;span class="s2"&gt;"--eval-size"&lt;/span&gt;,&lt;span class="s2"&gt;"200"&lt;/span&gt;,&lt;span class="s2"&gt;"--merge"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then execute it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run &lt;span class="nb"&gt;jobs &lt;/span&gt;execute gemma4-finetuning-job &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="nt"&gt;--async&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Migrating to Gemma 4 requires handling its new architecture and response formats, but the effort pays off with its superior reasoning and adherence to instructions. By leveraging &lt;a href="https://docs.cloud.google.com/run/docs/create-jobs" rel="noopener noreferrer"&gt;Cloud Run Jobs&lt;/a&gt; and &lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads" rel="noopener noreferrer"&gt;Serverless Blackwell GPUs&lt;/a&gt;, you can train these massive models efficiently without managing servers.&lt;/p&gt;

&lt;p&gt;To get started with inference, explore this codelab: &lt;a href="https://codelabs.developers.google.com/codelabs/cloud-run/cloud-run-gpu-rtx-pro-6000-gemma4-vllm#0" rel="noopener noreferrer"&gt;Run inference of Gemma 4 model on Cloud Run with RTX 6000 Pro GPU with vLLM&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To learn more about production serving, refer to the &lt;a href="https://docs.cloud.google.com/run/docs/run-gemma-on-cloud-run" rel="noopener noreferrer"&gt;Cloud Run Gemma 4 documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy fine-tuning! 🎉&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to Ryan Mullins, Juyeong Ji and Gus Martins from the Gemma 4 team for the helpful review and feedback on this blog.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gemma</category>
      <category>machinelearning</category>
      <category>googlecloud</category>
      <category>cloudrun</category>
    </item>
    <item>
      <title>Fine-Tuning Gemma 3 with Cloud Run Jobs: Serverless GPUs (NVIDIA RTX 6000 Pro) for pet breed classification 🐈🐕</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:07:00 +0000</pubDate>
      <link>https://forem.com/googleai/fine-tuning-gemma-3-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-248b</link>
      <guid>https://forem.com/googleai/fine-tuning-gemma-3-with-cloud-run-jobs-serverless-gpus-nvidia-rtx-6000-pro-for-pet-breed-248b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr33mdn056bnbis88u9kj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr33mdn056bnbis88u9kj.png" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Architectural worklow: fine tuning Gemma 3 27B on Cloud Run Jobs&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;Recently, I was inspired by a major new release on Google Cloud: the availability of &lt;strong&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads?utm_campaign=CDR_0x91b1edb5_default_b488149523&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs&lt;/a&gt;&lt;/strong&gt; on &lt;a href="https://docs.cloud.google.com/run/docs/create-jobs" rel="noopener noreferrer"&gt;Cloud Run Jobs&lt;/a&gt;. This launch is important because it unlocks the ability to tackle fine-tuning workloads for open models with the simplicity of a serverless batch job. To put this new hardware to the test in a fun way, I fine tuned a multi-modal model to identify a pet’s breed from a photo using &lt;a href="https://www.robots.ox.ac.uk/~vgg/data/pets/" rel="noopener noreferrer"&gt;The Oxford-IIIT Pet Dataset&lt;/a&gt;. This model could be used for a “Smart pet care” — an AI application that identifies a pet’s breed from a photo and provides tailored health and nutrition advice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p12qsqvyysokppob26f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p12qsqvyysokppob26f.png" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Image taken from &lt;a href="https://www.robots.ox.ac.uk/~vgg/data/pets/" rel="noopener noreferrer"&gt;The Oxford-IIIT Pet Dataset&lt;/a&gt; and showcase the images of cats and dogs and their corresponding breed — the classification label&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Why Fine-Tuning?
&lt;/h3&gt;

&lt;p&gt;In a recent &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4" rel="noopener noreferrer"&gt;Agent Factory episode&lt;/a&gt;, we discussed that while foundational models are a powerful ‘one-size-fits-all’ starting point, they essentially remain generalists. You should consider fine-tuning when you have a problem that requires &lt;strong&gt;high specialization&lt;/strong&gt; that a generalist model might not excel in on its own, or when you need more &lt;strong&gt;control&lt;/strong&gt; and &lt;strong&gt;cost-efficiency&lt;/strong&gt; over your own hosting.&lt;/p&gt;

&lt;p&gt;For this pet-care use case, distinguishing between 37 different breeds isn’t just about ‘knowledge’, it’s about taking that foundational reasoning and adding a specific capability based on a unique dataset. As we explored in the episode and as mentioned in this &lt;a href="https://arxiv.org/pdf/2506.02153" rel="noopener noreferrer"&gt;Nvidia paper&lt;/a&gt;, this kind of specialization is what allows smaller, focused models to become &lt;strong&gt;sufficiently powerful&lt;/strong&gt; and &lt;strong&gt;economical&lt;/strong&gt; for production agentic systems. Fine-tuning acts as the necessary bridge, transforming a broad reasoner into a high-precision classification expert.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bridging Reasoning and Precision
&lt;/h3&gt;

&lt;p&gt;For this project, I chose the multimodal breadth of &lt;a href="https://huggingface.co/google/gemma-3-27b-it" rel="noopener noreferrer"&gt;Gemma 3 27B&lt;/a&gt;. While specialized vision models often provide superior accuracy for narrow identification tasks, I wanted to use a model capable of both identifying breeds and reasoning about the specific health and dietary needs associated with them. By leveraging the power of the new &lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads?e=48754805" rel="noopener noreferrer"&gt;Blackwell GPUs&lt;/a&gt;, I was able to fine-tune this model to bridge the performance gap, all while keeping the setup &lt;strong&gt;reproducible, cost-effective,&lt;/strong&gt; and entirely &lt;strong&gt;container-native.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From Batch to Production: Economically Efficient Hosting
&lt;/h3&gt;

&lt;p&gt;The true ‘deploy and forget’ magic happens after the weights are saved. With high-performance inference &lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads?e=48754805&amp;amp;utm_campaign=CDR_0x91b1edb5_default_b488149523&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;now supported&lt;/a&gt; on Cloud Run, you can host your fine-tuned Gemma 3 27B model on the same NVIDIA RTX PRO 6000 Blackwell GPU without managing any underlying infrastructure. This setup delivers a highly economical production environment: Cloud Run automatically &lt;strong&gt;scales your GPU instances to zero&lt;/strong&gt; when they aren’t in use, ensuring you only pay for the exact minutes your model is active.&lt;/p&gt;

&lt;p&gt;In this guide, I’m excited to show you how this new hardware release transforms complex fine-tuning into a scalable, serverless experience without the need to manage complex clusters or maintain idle instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simplifying 27B Fine-Tuning on Cloud Run
&lt;/h2&gt;

&lt;p&gt;Fine-tuning an open model can seem like a daunting task that requires complex orchestration, from provisioning high-capacity VMs and manually installing CUDA drivers to managing tedious data transfers and scaling down manually to control costs. &lt;a href="https://docs.cloud.google.com/run/docs/create-jobs" rel="noopener noreferrer"&gt;Cloud Run Jobs&lt;/a&gt; elegantly solves this by allowing you to package your training logic as a container, now backed by the fully managed environment of &lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads" rel="noopener noreferrer"&gt;&lt;strong&gt;NVIDIA RTX PRO 6000 Blackwell GPUs&lt;/strong&gt;&lt;/a&gt; and their 96GB of VRAM.&lt;/p&gt;

&lt;p&gt;This setup delivers on-demand availability without the need for reservations, rapid 5-second startup times with drivers pre-installed, and automatic scale-to-zero efficiency that ensures you only pay for the minutes your model is training. By leveraging built-in GCS volume mounting for high-speed access to model weights, we can now move past infrastructure hurdles and focus on the core task: fine-tuning Gemma 3 27B to achieve high-precision results for &lt;strong&gt;Pet Breed Classification&lt;/strong&gt; on the &lt;a href="https://www.robots.ox.ac.uk/~vgg/data/pets/" rel="noopener noreferrer"&gt;Oxford-IIIT Pet Dataset&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’d like to dive straight into the code, you can clone the repository &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/finetune_gemma" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin the fine-tuning process, ensure you have the following software and environment configurations in place.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.12+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.astral.sh/uv/getting-started/installation/#standalone-installer" rel="noopener noreferrer"&gt;&lt;strong&gt;uv&lt;/strong&gt;&lt;/a&gt; (Python package manager): will be used to manage our local Python environment and speed up our Docker builds. Use curl to download the script and execute it with sh:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/sdk/docs/install" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud SDK&lt;/strong&gt;&lt;/a&gt; (gcloud CLI) installed and authenticated.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://docs.cloud.google.com/resource-manager/docs/creating-managing-projects" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Project&lt;/strong&gt;&lt;/a&gt; with billing enabled.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/endpoints/docs/openapi/enable-api" rel="noopener noreferrer"&gt;APIs Enabled&lt;/a&gt; Ensure the following APIs are active in your project: Cloud Run Admin API, Artifact Registry API, Cloud Build API, Secret Manager API, Compute Engine API (for GPU provisioning)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/docs/hub/en/security-tokens" rel="noopener noreferrer"&gt;Hugging Face Token&lt;/a&gt;: A valid token with access to the &lt;a href="https://huggingface.co/google/gemma-3-27b-it" rel="noopener noreferrer"&gt;Gemma 3 27B-IT&lt;/a&gt; model weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Access to gated models:&lt;/strong&gt; &lt;a href="https://huggingface.co/google/gemma-3-27b-it" rel="noopener noreferrer"&gt;Gemma 3 27B-IT&lt;/a&gt; is a gated model, which means you must explicitly accept the terms of use before you can download or fine-tune the weights.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Accept the License:&lt;/strong&gt; Visit the &lt;a href="https://huggingface.co/google/gemma-3-27b-it" rel="noopener noreferrer"&gt;Gemma 3 27B-IT&lt;/a&gt; model page on Hugging Face and click the “Agree and access repository” button.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate a Token:&lt;/strong&gt; Once access is &lt;a href="https://huggingface.co/docs/hub/en/security-tokens" rel="noopener noreferrer"&gt;granted&lt;/a&gt;, ensure your Hugging Face Token has “read” permissions (or “write” if you plan to push your fine-tuned model back to the Hub) to authenticate your training job.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1 — Setting the stage: Your environment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1.1 — Prepare your Google Cloud environment
&lt;/h3&gt;

&lt;p&gt;Set environment variables.&lt;/p&gt;

&lt;p&gt;[!IMPORTANT] &lt;strong&gt;Regional Alignment is Critical:&lt;/strong&gt; To use Cloud Storage volume mounting, your GCS bucket &lt;strong&gt;must&lt;/strong&gt; be in the same region as your Cloud Run job. We recommend using europe-west4 (Netherlands) as it supports the RTX PRO 6000 Blackwell GPU and ensures zero-latency access to your model weights.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_PROJECT_ID
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west4
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_HF_TOKEN
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"finetune-gemma-job-sa"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="nt"&gt;-gemma3-finetuning-eu&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AR_REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma3-finetuning-repo
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SECRET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;HF_TOKEN
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma3-finetune
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;JOB_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemma3-finetuning-job
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.2 — Get the code
&lt;/h3&gt;

&lt;p&gt;Whether you’re running locally or on the cloud, you’ll need the code. After you open Cloud Shell or install your local Google Cloud CLI, you need to clone the repository. The finetune_gemma repository contains the finetune_and_evaluate.py script, a Dockerfile, and the requirements.txt file to your machine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/GoogleCloudPlatform/devrel-demos
&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-demos/ai-ml/finetune_gemma/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Login to gcloud (this is required to run gcloud commands authorize the CLI tool):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your Project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the service account and grant storage permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud iam service-accounts create &lt;span class="nv"&gt;$SERVICE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Service Account for Gemma 3 fine-tuning"&lt;/span&gt;

gcloud storage buckets create gs://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$REGION&lt;/span&gt;

gcloud storage buckets add-iam-policy-binding gs://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;serviceAccount:&lt;span class="nv"&gt;$SERVICE_ACCOUNT&lt;/span&gt;@&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;roles/storage.objectAdmin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an Artifact Registry repository and store your HF Token in Secret Manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud artifacts repositories create &lt;span class="nv"&gt;$AR_REPO&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Gemma 3 finetuning repository"&lt;/span&gt;

&lt;span class="c"&gt;# Create the secret (ignore error if it already exists)&lt;/span&gt;
gcloud secrets create &lt;span class="nv"&gt;$SECRET_ID&lt;/span&gt; &lt;span class="nt"&gt;--replication-policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"automatic"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true

printf&lt;/span&gt; &lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt; | gcloud secrets versions add &lt;span class="nv"&gt;$SECRET_ID&lt;/span&gt; &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

gcloud secrets add-iam-policy-binding &lt;span class="nv"&gt;$SECRET_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt; serviceAccount:&lt;span class="nv"&gt;$SERVICE_ACCOUNT&lt;/span&gt;@&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'roles/secretmanager.secretAccessor'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2 — Staging the Model with cr-infer (Recommended)
&lt;/h2&gt;

&lt;p&gt;To avoid downloading the model every time the job runs, we’ll stage the &lt;strong&gt;Gemma 3 27B&lt;/strong&gt; weights in Google Cloud Storage. We’ll use &lt;a href="https://github.com/oded996/cr-infer" rel="noopener noreferrer"&gt;&lt;strong&gt;cr-infer&lt;/strong&gt;&lt;/a&gt;, which allows you to run model transfers directly via uvx without needing a local installation.&lt;/p&gt;

&lt;p&gt;Before running the transfer, you must set up your Application Default Credentials. This is required for running scripts locally. In this case it allows the cr-infer tool to use your local identity to write the weights to your GCS bucket.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud auth application-default login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Download Gemma 3 27B to GCS&lt;/strong&gt;: Now, execute the transfer using uvx. This clones the model into gs://$BUCKET_NAME/google/gemma-3–27b-it/, allowing our Cloud Run job to mount the weights as a local volume and save gigabytes of container startup time&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx — from git+https://github.com/oded996/cr-infer.git cr-infer model download &lt;span class="se"&gt;\-&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;huggingface &lt;span class="se"&gt;\&lt;/span&gt;
 - model-id google/gemma-3–27b-it &lt;span class="se"&gt;\&lt;/span&gt;
 - bucket &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 - token &lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3 — Build and push the container image
&lt;/h2&gt;

&lt;p&gt;Our Dockerfile leverages &lt;strong&gt;uv&lt;/strong&gt; for fast dependency installation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Use Google Cloud Build (Recommended — No local Docker needed)
&lt;/h3&gt;

&lt;p&gt;This is the easiest way to build your image directly in the cloud and push it to Artifact Registry. (The build typically takes &lt;strong&gt;10–15 minutes&lt;/strong&gt; as it downloads large ML dependencies like PyTorch).&lt;/p&gt;

&lt;p&gt;gcloud builds submit — tag $REGION-docker.pkg.dev/$PROJECT_ID/$AR_REPO/$IMAGE_NAME:latest .&lt;/p&gt;

&lt;p&gt;[!TIP] You can track the real-time progress of your build in the &lt;a href="https://console.cloud.google.com/cloud-build/builds" rel="noopener noreferrer"&gt;Cloud Build console&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Build locally with Docker
&lt;/h3&gt;

&lt;p&gt;If you have Docker Desktop installed locally:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install uv locally&lt;/strong&gt; (if you haven’t already):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Build the image:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Push to AR:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker tag &lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="nt"&gt;-docker&lt;/span&gt;.pkg.dev/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$AR_REPO&lt;/span&gt;/&lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt;
docker push &lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="nt"&gt;-docker&lt;/span&gt;.pkg.dev/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$AR_REPO&lt;/span&gt;/&lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.1 — Test locally (Optional)
&lt;/h3&gt;

&lt;p&gt;I like to start with a quick local test run to validate the setup. It serves as a sanity check for your environment and scripts before moving the workload to Cloud Run. For this test, we use parameters optimized for speed and a smaller model, google/gemma-3–4b-it, to ensure the model correctly learns the task format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 finetune_and_evaluate.py &lt;span class="se"&gt;\&lt;/span&gt;
- model-id google/gemma-3–4b-it &lt;span class="se"&gt;\&lt;/span&gt;
 - train-size 20 &lt;span class="se"&gt;\&lt;/span&gt;
 - eval-size 20 &lt;span class="se"&gt;\&lt;/span&gt;
 - gradient-accumulation-steps 2 &lt;span class="se"&gt;\&lt;/span&gt;
 - learning-rate 2e-4 &lt;span class="se"&gt;\&lt;/span&gt;
 - batch-size 1 &lt;span class="se"&gt;\&lt;/span&gt;
 - num-epochs 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On my Apple M4 Pro, running this on the CPU took about &lt;strong&gt;20–30 minutes.&lt;/strong&gt; If you want to see early signs of progress locally, you can increase the sample size — I found that a one-hour run on my Mac with 50 training and testing samples already yielded a 4% improvement in accuracy and a 3% boost in F1-score.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmlpzzou35x4bwnh8wiv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmlpzzou35x4bwnh8wiv.png" width="800" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Results from a local run on my Mac with 50 train and 50 test samples&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Inside the Fine-Tuning Script: How it Works
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/blob/main/ai-ml/finetune_gemma/finetune_and_evaluate.py" rel="noopener noreferrer"&gt;finetune_and_evaluate&lt;/a&gt;.py script is designed to be a complete, self-contained pipeline, handling everything from data preparation to hardware-aware optimization and evaluation. Here is a look at the core logic that makes this possible:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Memory-Efficient Model Loading
&lt;/h3&gt;

&lt;p&gt;To fit a 27B parameter model into the 96GB VRAM of the Blackwell GPU, the script uses 4-bit quantization via the &lt;a href="https://github.com/bitsandbytes-foundation/bitsandbytes" rel="noopener noreferrer"&gt;bitsandbytes&lt;/a&gt; library. By setting low_cpu_mem_usage=True, it also ensures the model is loaded efficiently without exhausting the system RAM.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Vision-Language LoRA Configuration
&lt;/h3&gt;

&lt;p&gt;Instead of updating all 27 billion parameters, we use LoRA (Low-Rank Adaptation). We target all the primary projection layers in the transformer blocks, allowing the model to adapt its internal representations to the visual nuances of the pet breeds while keeping the total trainable parameter count extremely low. More details on efficient GPU memory usage can be found in this &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models/?e=48754805" rel="noopener noreferrer"&gt;blog&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Custom Data Collator
&lt;/h3&gt;

&lt;p&gt;This is a crucial part for fine-tuning vision-language models (VLMs). Because VLMs process a mix of image and text tokens, the data_collator ensures that the model only learns from the breed label (the model’s response). The &lt;em&gt;turn marker&lt;/em&gt; is a structural boundary that signals the exact point where the user stops speaking and the model’s response begins. The script ensures the model learns only from the breed label by searching for the model’s &lt;em&gt;turn marker&lt;/em&gt; in the token sequence and masking out the user’s prompt and image tokens, so they don’t contribute to the training loss.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Breed Extraction
&lt;/h3&gt;

&lt;p&gt;Generative models often add conversational filler (e.g., “The animal in this image is a Samoyed”). Our evaluation logic includes a robust extraction heuristic that sorts class names by length. This ensures that if the model mentions “English Cocker Spaniel,” it correctly identifies the full breed rather than just matching “Cocker Spaniel”.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Automated GCS Archiving
&lt;/h3&gt;

&lt;p&gt;Once the training completes and the final evaluation is calculated, the script doesn’t just stop. It bundles the fine-tuned LoRA adapters with the original model processor and automatically uploads the entire directory to your Google Cloud Storage bucket. This ensures your model is immediately ready for deployment or serving.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Create and execute the Cloud Run job
&lt;/h2&gt;

&lt;p&gt;Now, we harness the power of the &lt;strong&gt;NVIDIA RTX PRO 6000 Blackwell GPU.&lt;/strong&gt; Our container is built with &lt;strong&gt;CUDA 12.8&lt;/strong&gt; for full Blackwell/PyTorch 2.7 compatibility and uses an ENTRYPOINT configuration, allowing you to pass script arguments directly via the — args flag.&lt;/p&gt;

&lt;p&gt;[!TIP] &lt;strong&gt;If the job already exists&lt;/strong&gt;, use gcloud beta run jobs update instead of create.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run &lt;span class="nb"&gt;jobs &lt;/span&gt;create &lt;span class="nv"&gt;$JOB_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 - region &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 - image &lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="nt"&gt;-docker&lt;/span&gt;.pkg.dev/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$AR_REPO&lt;/span&gt;/&lt;span class="nv"&gt;$IMAGE_NAME&lt;/span&gt;:latest &lt;span class="se"&gt;\&lt;/span&gt;
 - set-env-vars &lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 - set-secrets &lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$SECRET_ID&lt;/span&gt;:latest &lt;span class="se"&gt;\&lt;/span&gt;
 - no-gpu-zonal-redundancy &lt;span class="se"&gt;\&lt;/span&gt;
 - cpu 20.0 &lt;span class="se"&gt;\&lt;/span&gt;
 - memory 80Gi &lt;span class="se"&gt;\&lt;/span&gt;
 - task-timeout 60m &lt;span class="se"&gt;\&lt;/span&gt;
 - gpu 1 &lt;span class="se"&gt;\&lt;/span&gt;
 - gpu-type nvidia-rtx-pro-6000 &lt;span class="se"&gt;\&lt;/span&gt;
 - service-account &lt;span class="nv"&gt;$SERVICE_ACCOUNT&lt;/span&gt;@&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;.iam.gserviceaccount.com &lt;span class="se"&gt;\&lt;/span&gt;
 - add-volume &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-volume,type&lt;span class="o"&gt;=&lt;/span&gt;cloud-storage,bucket&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 - add-volume-mount &lt;span class="nv"&gt;volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-volume,mount-path&lt;span class="o"&gt;=&lt;/span&gt;/mnt/gcs &lt;span class="se"&gt;\&lt;/span&gt;
 - &lt;span class="nv"&gt;network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default &lt;span class="se"&gt;\&lt;/span&gt;
 - &lt;span class="nv"&gt;subnet&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default &lt;span class="se"&gt;\&lt;/span&gt;
 - vpc-egress&lt;span class="o"&gt;=&lt;/span&gt;private-ranges-only &lt;span class="se"&gt;\&lt;/span&gt;
 - &lt;span class="nv"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;" - model-id"&lt;/span&gt;,&lt;span class="s2"&gt;"/mnt/gcs/google/gemma-3–27b-it/"&lt;/span&gt;,&lt;span class="s2"&gt;" - output-dir"&lt;/span&gt;,&lt;span class="s2"&gt;"/tmp/gemma3-finetuned"&lt;/span&gt;,&lt;span class="s2"&gt;" - gcs-output-path"&lt;/span&gt;,&lt;span class="s2"&gt;"gs://&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s2"&gt;/gemma3-finetuned"&lt;/span&gt;,&lt;span class="s2"&gt;" - train-size"&lt;/span&gt;,&lt;span class="s2"&gt;"800"&lt;/span&gt;,&lt;span class="s2"&gt;" - eval-size"&lt;/span&gt;,&lt;span class="s2"&gt;"200"&lt;/span&gt;,&lt;span class="s2"&gt;" - learning-rate"&lt;/span&gt;,&lt;span class="s2"&gt;"5e-5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note on Execution Limits:&lt;/strong&gt; Tasks using GPUs on Cloud Run Jobs currently have a maximum execution time of &lt;strong&gt;60 minutes&lt;/strong&gt;. To ensure this training job completes within the standard public limit, we have set the — num_epochs to 3 and restricted the — train-size to 800 samples. If your specific fine-tuning workload requires more time, you can sample your training dataset into segments that fit in under 60 minutes (like 800 samples in our case) and process them as a sequence of independent tasks while using checkpointing for the model training.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Deployment Flags
&lt;/h3&gt;

&lt;p&gt;To ensure a stable and production-ready environment, we use several specialized flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;— gpu-type nvidia-rtx-pro-6000:&lt;/strong&gt; Targets the NVIDIA RTX PRO 6000 Blackwell GPU. With &lt;strong&gt;96GB of GPU memory (VRAM), 1.6 TB/s bandwidth,&lt;/strong&gt; and support for &lt;strong&gt;FP4/FP6 precision,&lt;/strong&gt; it provides the ample overhead and high-speed throughput needed for multimodal fine-tuning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;— memory 80Gi:&lt;/strong&gt; We allocate high system RAM (scalable up to 176GB) to handle the low_cpu_mem_usage model loading and our memory-efficient streaming data generator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;— cpu 20.0:&lt;/strong&gt; Cloud Run Jobs allows scaling up to &lt;strong&gt;44 vCPUs&lt;/strong&gt; per instance, ensuring that preprocessing and data loading never become a bottleneck for the GPU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;— add-volume &amp;amp; — add-volume-mount:&lt;/strong&gt; This mounts your GCS bucket as a local directory at /mnt/gcs. &lt;strong&gt;Note:&lt;/strong&gt; This requires the bucket and the job to be in the same region (europe-west4). It allows the script to read the base model weights at data-center speeds without copying them into the container’s writable layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;— network &amp;amp; — subnet:&lt;/strong&gt; Configures &lt;strong&gt;Direct VPC Egress&lt;/strong&gt;, allowing the job to communicate securely with other resources in your VPC. To make sure this works you need to enable &lt;a href="https://docs.cloud.google.com/vpc/docs/configure-private-google-access" rel="noopener noreferrer"&gt;“Private Google Access”&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;— vpc-egress=all-traffic:&lt;/strong&gt; Ensures all outgoing traffic, including requests to Hugging Face, is routed through your VPC for enhanced security and monitoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[!TIP] If you skipped Step 2 and didn’t stage the model in your GCS bucket, you must change the — model-id in the — args to google/gemma-3–27b-it. This tells the script to download the weights directly from Hugging Face at runtime, though this will be significantly slower than using the GCS mount&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execute the job:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run &lt;span class="nb"&gt;jobs &lt;/span&gt;execute &lt;span class="nv"&gt;$JOB_NAME&lt;/span&gt; — region &lt;span class="nv"&gt;$REGION&lt;/span&gt; — async
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5 — Check Results and Evaluate Performance
&lt;/h2&gt;

&lt;p&gt;Once your job finishes, you can jump into the Google Cloud Console to inspect the detailed logs. You’ll find your newly fine-tuned model waiting for you in your Cloud Storage bucket at gs://$BUCKET_NAME/gemma3-finetuned.&lt;/p&gt;

&lt;p&gt;To rigorously quantify how well Gemma 3 learned to identify these breeds, we used Accuracy and Macro F1 Score as our primary metrics. While accuracy gives us a clear overall percentage, the F1 score ensures the model is accurate across all 37 breeds, not just the most common ones.&lt;/p&gt;

&lt;p&gt;In my testing, I saw a clear progression as we scaled our data and compute:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsv9ukl6ye7kuva89099k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsv9ukl6ye7kuva89099k.png" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Results with different sample size&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;79% Accuracy, 77% F1-score (1.1h run):&lt;/strong&gt; Trained on 1,000 samples and evaluated against 200 test samples, this was a significant jump from the zero-shot baseline of 66%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;93% Accuracy, 91% F1-score (2.3h run):&lt;/strong&gt; By scaling up to 2,500 training samples (and 1,500 test samples), the model reached nearly state-of-the-art performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;94% Accuracy &amp;amp; 91.5% F1 (3.3h run):&lt;/strong&gt; With a larger run on 3,600 training samples (evaluated against 3,500 test samples), the model effectively hit the state-of-the-art benchmark for this dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzldj4ngizd6okblrtry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzldj4ngizd6okblrtry.png" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Performance summary report for 3600 train samples and 3500 test sample — reached state of the art with &lt;strong&gt;94% accuracy!&lt;/strong&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;It is important to note that the standard &lt;strong&gt;public limit&lt;/strong&gt; for GPU jobs is currently 60 minutes. As mentioned in step 4, sampling and &lt;a href="https://huggingface.co/docs/trl/sft_trainer#trl.SFTTrainer.train.resume_from_checkpoint" rel="noopener noreferrer"&gt;checkpointing&lt;/a&gt; can help overcome this limitation.&lt;/p&gt;

&lt;p&gt;These results prove that fine-tuning is the necessary bridge for generalist models, by leveraging serverless Blackwell GPUs, we’ve transformed a massive reasoner into a high-precision expert ready for production&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps: Serving your fine-tuned model on Cloud Run
&lt;/h3&gt;

&lt;p&gt;Now that you’ve fine-tuned Gemma 3, the next challenge is serving it efficiently for production-grade inference.&lt;/p&gt;

&lt;p&gt;The true “deploy and forget” magic happens when you transition your saved weights into a serving environment. By hosting your fine-tuned model on Cloud Run with serverless Blackwell GPUs, you get a highly economical production environment where your GPU instances automatically scale to zero when they aren’t in use. This setup eliminates the operational toil of cluster management and manual maintenance, allowing you to serve massive models with no reservations, you only pay for the exact minutes your model is active.&lt;/p&gt;

&lt;p&gt;To get started with inference, explore this codelab: &lt;a href="https://codelabs.developers.google.com/codelabs/cloud-run/cloud-run-gpu-rtx-pro-6000" rel="noopener noreferrer"&gt;Run inference using a Gemma model on Cloud Run with RTX 6000 Pro GPU&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To learn more about production serving, refer to the official guide on &lt;a href="https://docs.cloud.google.com/run/docs/run-gemma-on-cloud-run" rel="noopener noreferrer"&gt;Running Gemma 3 on Cloud Run&lt;/a&gt;. The documentation provides a comprehensive roadmap for building a robust inference service, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimized Deployment:&lt;/strong&gt; Instructions for serving Gemma models using GPU accelerators and loading model weights via high-speed Cloud Storage volume mounts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure Interaction:&lt;/strong&gt; Guidance on using IAM authentication to securely call your deployed service with the Google Gen AI SDK.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Configuration:&lt;/strong&gt; Best practices for setting concurrency to achieve optimal request latency and high GPU utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Special thanks to Sara Ford and Oded Shahar from the Cloud Run team for the helpful review and feedback on this article.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nvidia</category>
      <category>ai</category>
      <category>gemma</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Agent Factory Recap: Supercharging Agents on GKE with Agent Sandbox and Pod Snapshots</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:04:00 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-supercharging-agents-on-gke-with-agent-sandbox-and-pod-snapshots-3a5e</link>
      <guid>https://forem.com/googleai/agent-factory-recap-supercharging-agents-on-gke-with-agent-sandbox-and-pod-snapshots-3a5e</guid>
      <description>&lt;p&gt;In the latest episode of the &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;Agent Factory&lt;/a&gt;, Mofi Rahman and I had the pleasure of hosting, Brandon Royal, the PM working on agentic workloads on GKE. We dove deep into the critical questions around the nuances of choosing the right agent runtime, the power of GKE for agents, and the essential security measures needed for intelligent agents to run code.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GKE for Agents?
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=109s" rel="noopener noreferrer"&gt;01:49&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We kicked off our discussion by tackling a fundamental question: why choose GKE as your agent runtime when serverless options like Cloud Run or fully managed solutions like Agent Engine exist?&lt;/p&gt;

&lt;p&gt;Brandon explained that the decision often boils down to control versus convenience. While serverless options are perfectly adequate for basic agents, the flexibility and governance capabilities of Kubernetes and GKE become indispensable in high-scale scenarios involving hundreds or thousands of agents. GKE truly shines when you need granular control over your agent deployments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl08gkxy41hseuy3fljpu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl08gkxy41hseuy3fljpu.png" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ADK on GKE
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=418s" rel="noopener noreferrer"&gt;06:58&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've discussed the &lt;a href="https://www.youtube.com/watch?v=aLYrV61rJG4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=17" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt; in previous episodes, and Mofi highlighted to us how seamlessly it integrates with GKE and even showed a demo with the agent he built. ADK provides the framework for building the agent's logic, traces, and tools, while GKE provides the robust hosting environment. You can containerize your ADK agent, push it to Google Artifact Registry, and deploy it to GKE in minutes, transforming a local prototype into a globally accessible service.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sandbox problem
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=920s" rel="noopener noreferrer"&gt;15:20&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As agents become more sophisticated and capable of writing and executing code, a critical security concern emerges: the risk of untrusted, LLM-generated code. Brandon emphasized that while code execution is vital for high-performance agents and deterministic behavior, it also introduces significant risks in multi-tenant systems. This led us to the concept of a "sandbox."&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Sandbox?
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1158s" rel="noopener noreferrer"&gt;19:18&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those less familiar with security engineering, Brandon clarified that a sandbox provides kernel and network isolation. Mofi further elaborated, explaining that agents often need to execute scripts (e.g., Python for data analysis). Without a sandbox, a hallucinating or prompt-injected model could potentially delete databases or steal secrets if allowed to run code directly on the main server. A sandbox creates a safe, isolated environment where such code can run without harming other systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Sandbox on GKE Demo
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1225s" rel="noopener noreferrer"&gt;20:25&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, how do we build this "high fence" on Kubernetes? Brandon introduced the Agent Sandbox on Kubernetes, which leverages technologies like gVisor, an application kernel sandbox. When an agent needs to execute code, GKE dynamically provisions a completely isolated pod. This pod operates with its own kernel, network, and file system, effectively trapping any malicious code within the gVisor bubble. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexw6cndzjl0w1ybb8mz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexw6cndzjl0w1ybb8mz1.png" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Mofi walked us through a compelling demo of the Agent Sandbox in action.We observed an ADK agent being given a task requiring code execution. As the agent initiated code execution, GKE dynamically provisioned a new pod, visibly labeled as "sandbox-executor," demonstrating the real-time isolation. Brandon highlighted that this pod is configured with strict network policies, further enhancing security.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feauxfwh9kazbqc32u7kz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feauxfwh9kazbqc32u7kz.png" width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: Pod Snapshots
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=5_R_Ixk8ENQ&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1779s" rel="noopener noreferrer"&gt;29:39&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While the Agent Sandbox offers incredible security, the latency of spinning up a new pod for every task is a concern. Mofi demoed the game-changing solution: Pod Snapshots. This technology allows us to save their state of running sandboxes and then near-instantly restore them when an agent needs them. Brandon noted that this reduces startup times from minutes to seconds, revolutionizing real-time agentic workflows on GKE.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cfc4k9zczexdby59o0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cfc4k9zczexdby59o0z.png" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It's incredible to see how GKE isn't just hosting agents; it's actively protecting them and making them faster. &lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;Ready to put these concepts into practice? Dive into the full episode to see the demos in action and explore how GKE can supercharge your agentic workloads.&lt;/p&gt;

&lt;p&gt;Learn how to &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/agentic-adk-vertex?utm_campaign=CDR_0x036db2a4_default&amp;amp;utm_medium=external&amp;amp;utm_source=youtube" rel="noopener noreferrer"&gt;deploy an ADK agent to Google Kubernetes Engine&lt;/a&gt; and how to get your run agent to run code safely using the &lt;a href="http://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox" rel="noopener noreferrer"&gt;GKE agent Sandbox&lt;/a&gt;.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Shir Meir Lador → &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mofi Rahman → &lt;a href="https://www.linkedin.com/in/moficodes" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Brandon Royal → &lt;a href="https://www.linkedin.com/in/brandonroyal/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Agent Factory Recap: Reinforcement Learning and Fine-Tuning on TPUs</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Tue, 31 Mar 2026 18:56:42 +0000</pubDate>
      <link>https://forem.com/googleai/agent-factory-recap-reinforcement-learning-and-fine-tuning-on-tpus-1o6j</link>
      <guid>https://forem.com/googleai/agent-factory-recap-reinforcement-learning-and-fine-tuning-on-tpus-1o6j</guid>
      <description>&lt;p&gt;In our agent factory holiday special, Don McCasland and I were joined by Kyle Meggs, Senior Product Manager on the TPU Training Team at Google, to dive deep into the world of model fine tuning. We focused specifically on reinforcement learning (RL), and how Google's own infrastructure of TPUs are designed to power these massive workloads at scale.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Consider Fine-Tuning
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=2&amp;amp;t=193s" rel="noopener noreferrer"&gt;3:13&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We started with a fundamental question: with foundational models like Gemini becoming so powerful out of the box, and customization through the prompt can often be good enough, when should you consider fine-tuning? &lt;/p&gt;

&lt;p&gt;Fine tuning your own model is relevant when you need high specialization for unique datasets where a generalist model might not excel (such as in the medical domain), or when you have strict privacy restrictions that require hosting your own models trained on your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Lifecycle: Pre-training and Post-training (SFT and RL)
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=232s" rel="noopener noreferrer"&gt;3:52&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Kyle used a great analogy inspired by Andrej Karpathy to break down the stages of training. He described pre-training as "knowledge acquisition," similar to reading a chemistry textbook to learn how things work. Post-training is further split into Supervised Fine-Tuning (SFT), which is analogous to reading already-solved practice problems within the textbook chapter, and Reinforcement Learning (RL), which is like solving new practice problems without help and then checking your answers in the back of the book to measure yourself against an optimal approach and correct answers. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc192k921af4wed7698x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc192k921af4wed7698x.png" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Reinforcement Learning (RL) is Essential
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=350s" rel="noopener noreferrer"&gt;5:50&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We explored why RL is currently so important for building modern LLMs. Kyle explained that unlike SFT, which is about imitation, RL is about grading actions to drive "alignment." It’s crucial for teaching a model safety (penalizing what not to do), enabling the model to use tools like search and interact with the physical world through trial and error, and for performing verifiable tasks like math or coding by rewarding the entire chain of thought that leads to a correct answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Industry Pulse: Why 2025 is the year of RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=513s" rel="noopener noreferrer"&gt;8:33&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In this segment, we looked at the rapidly evolving landscape of RL. Kyle noted that it is fair to call 2025 the "year of RL," highlighting the massive increase in investment and launches across the industry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;January:&lt;/strong&gt; DeepSeek-R1 launched, making a huge splash with open-source GRPO.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Summer:&lt;/strong&gt; xAI launched Grok 4, reportedly running a 200k GPU cluster for RL at "pre-training scale."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;October:&lt;/strong&gt; A slew of new tooling launches across Google, Meta, and TML.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;November:&lt;/strong&gt; Gemini 3 launched as a premier thinking model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recent:&lt;/strong&gt; Google launched MaxText 2.0 for fine-tuning on TPUs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ud8v71oa92vgbu4iz5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ud8v71oa92vgbu4iz5.png" alt="alt text" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hurdles of Implementing RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=646s" rel="noopener noreferrer"&gt;10:46&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following the industry trends, we discussed why RL is so difficult to implement. Kyle explained that RL combines the complexities of both training and inference into a single process. He outlined three primary challenges: managing infrastructure at the right balance and scale to avoid bottlenecks; choosing the right code, models, algorithms (like GRPO vs. DPO), and data; and finally, the difficulty of integrating disparate components for training, inference, orchestration, and weight synchronization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjca0lpcpo23s95mzv876.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjca0lpcpo23s95mzv876.png" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To provide a solution across these dimensions of complexity, Google offers MaxText, a vertically integrated solution to help you perform RL in a highly scalable and performant fashion. MaxText provides highly optimized models, the latest post-training algorithms, high performance inference via LLM, and powerful scalability/flexibility via Pathways. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rch212bej2n6eck8lq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rch212bej2n6eck8lq8.png" alt="alt text" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In contrast to DIY approaches where users assemble their own stack of disparate components from many different providers, Google’s approach offers a single integrated stack of co-designed components, from &lt;strong&gt;silicon&lt;/strong&gt; to &lt;strong&gt;software&lt;/strong&gt; to &lt;strong&gt;solutions&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctihvw4xt9q6ajs1dfdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctihvw4xt9q6ajs1dfdp.png" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Factory Floor
&lt;/h2&gt;

&lt;p&gt;The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with a live demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TPUs Shine for RL
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=772s" rel="noopener noreferrer"&gt;12:52&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Before diving into the demo, Kyle explained why TPUs are uniquely suited for complex AI workloads like RL. Unlike other hardware, TPUs were designed system-first. A TPU Pod can connect up to 9,216 chips over low-latency interconnects, allowing for massive scale without relying on standard data center networks. This is a huge advantage for overcoming RL bottlenecks like weight synchronization. Furthermore, because they are purpose-built for AI, they offer superior price-performance and thermal efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitkt61wg3qhq2oobmryd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitkt61wg3qhq2oobmryd.png" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo: Reinforcement Learning (GRPO) with TPU
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=qBOvM7SiDa4&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=953s" rel="noopener noreferrer"&gt;15:53&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Don led a hands-on demonstration showing what RL looks like in action using Google's infrastructure. The demo showcased:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Using &lt;strong&gt;MaxText 2.0&lt;/strong&gt; as an integrated solution for the workload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leveraging models from MaxText and algorithms from Tunix.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handling inference using vLLM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Utilizing &lt;strong&gt;Pathways&lt;/strong&gt; for orchestration and scaling to run GRPO (Group Relative Policy Optimization).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4tqmo8zv62i6oufqj8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4tqmo8zv62i6oufqj8q.png" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This holiday special was a great deep dive into the cutting edge of model fine tuning. While foundational models are getting better every day, the future of highly specialized, capable agents relies on mastering post-training techniques like RL, and having the right vertically integrated infrastructure, like TPUs, to run them efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;We hope this episode gave you valuable tools and perspectives to think about fine-tuning your own specialized agents. Be sure to check out the resources below to explore MaxText 2.0 and start experimenting with TPUs for your workloads. We'll see you next year for a revamped season of The Agent Factory!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;Post-Training Docs &lt;a href="https://maxtext.readthedocs.io/en/latest/tutorials/post_training_index.html" rel="noopener noreferrer"&gt;https://maxtext.readthedocs.io/en/latest/tutorials/post_training_index.html&lt;/a&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Google Cloud TPU (Ironwood) Documentation: &lt;a href="https://docs.cloud.google.com/tpu/docs/tpu7x" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/tpu/docs/tpu7x&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Google Cloud open source code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MaxText - &lt;a href="https://github.com/AI-Hypercomputer/maxtext" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/maxtext&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GPU recipes - &lt;a href="https://github.com/AI-Hypercomputer/gpu-recipes" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/gpu-recipes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;TPU recipes - &lt;a href="https://github.com/AI-Hypercomputer/tpu-recipes" rel="noopener noreferrer"&gt;https://github.com/AI-Hypercomputer/tpu-recipes&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Andrej Karpathy - Chemistry Analogy: &lt;a href="https://youtu.be/7xTGNNLPyMI?si=Bubrqz_dPpvuqc1M&amp;amp;t=8069" rel="noopener noreferrer"&gt;Deep Dive into LLMs like ChatGPT&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Paper: "Small Language Models are the Future of Agentic AI" (Nvidia): &lt;a href="https://arxiv.org/abs/2506.02153" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://arxiv.org/abs/2506.02153" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2506.02153&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Fine tuning blog: &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805" rel="noopener noreferrer"&gt;https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification?e=48754805&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Shir Meir Lador →  &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/shirmeirlador/&lt;/a&gt;, &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Don McCasland →  &lt;a href="https://www.linkedin.com/in/donald-mccasland/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/donald-mccasland/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kyle Meggs → &lt;a href="https://www.linkedin.com/in/kyle-meggs/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/kyle-meggs/&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>gemini</category>
    </item>
    <item>
      <title>My First Experience Creating Antigravity Skills</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Fri, 20 Mar 2026 15:23:02 +0000</pubDate>
      <link>https://forem.com/googleai/my-first-experience-creating-antigravity-skills-524b</link>
      <guid>https://forem.com/googleai/my-first-experience-creating-antigravity-skills-524b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cvbil990snohnuztk9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cvbil990snohnuztk9w.png" width="700" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Experimenting with Agent skills for the first time, feeling empowered!&lt;/small&gt;&lt;/center&gt;

&lt;p&gt; &lt;br&gt;
Last week, I was at an event where we taught developers how to build &lt;a href="https://goo.gle/aaiwcr-1" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt;, &lt;a href="http://goo.gle/aaiwcr-2" rel="noopener noreferrer"&gt;agents&lt;/a&gt;, and &lt;a href="http://goo.gle/aaiwcr-3" rel="noopener noreferrer"&gt;deploy open models&lt;/a&gt; to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud Run&lt;/a&gt;. After the session, one of the developers shared something that really stuck with me: he was already using our content to create specialized &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;&lt;strong&gt;Skills&lt;/strong&gt;&lt;/a&gt; to share with his entire team.&lt;/p&gt;

&lt;p&gt;I got inspired and decided it was time to dive into &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt;. During my last project, the dev-signal agent, I had a lot of learnings about how to bring agents and AI applications to production in a robust and scalable manner. I thought, &lt;em&gt;this is a great opportunity to give my favorite coding agent, Google’s &lt;a href="https://www.antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; (Google’s “agent-first” IDE), those skills so that going forward, it will just do it for me!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through how I built the 13 production skills in this &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal/.agent/skills" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and the patterns behind them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Agent Skills?
&lt;/h2&gt;

&lt;p&gt;As &lt;a href="https://www.linkedin.com/in/iromin/?originalSubdomain=in" rel="noopener noreferrer"&gt;Romin Irani&lt;/a&gt; explains in &lt;a href="https://medium.com/google-cloud/tutorial-getting-started-with-antigravity-skills-864041811e0d" rel="noopener noreferrer"&gt;“Getting Started with Google Antigravity Skills”&lt;/a&gt;, skills represent a shift from monolithic context loading to &lt;strong&gt;Progressive Disclosure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Agents get “overwhelmed” when providing them too many tools all at once (a phenomenon known as “&lt;a href="https://www.linkedin.com/posts/smithakolan_your-ai-agent-is-not-bad-at-reasoning-activity-7422342915089178624-awR3?rcm=ACoAAAYeeDsBfJzKJQaDuSjRnUBmKV20OJV2olc" rel="noopener noreferrer"&gt;Tool Bloat&lt;/a&gt;”), to solve for that, Skills allow the agent to “load” specialist knowledge only when needed. When you ask an agent to “evaluate a shadow revision,” it will figure out it will need to leverage the &lt;strong&gt;Shadow Deployer&lt;/strong&gt; skill as context for this operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workspace vs. Global Scope
&lt;/h2&gt;

&lt;p&gt;In Antigravity, you can manage these skills in two distinct ways depending on how you want to use them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workspace Scope:&lt;/strong&gt; Located in &lt;em&gt;.agent/skills/&lt;/em&gt; within your project root. These are specific to your project and can be committed to GitHub so your entire team can benefit from the same production patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global Scope:&lt;/strong&gt; Located in &lt;em&gt;~/.gemini/antigravity/skills/.&lt;/em&gt; These are your personal utilities that stay with you across every project you work on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I built the skills
&lt;/h2&gt;

&lt;p&gt;Following the principles in &lt;a href="https://www.linkedin.com/in/petruzalek/" rel="noopener noreferrer"&gt;Daniela Petruzalek&lt;/a&gt;’s &lt;a href="https://medium.com/google-cloud/building-agent-skills-with-skill-creator-855f18e785cf" rel="noopener noreferrer"&gt;“Building Agent Skills with skill-creator”,&lt;/a&gt; I took a “methodology-first” approach. I used the existing dev-signal blog series I’ve been working on and the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;codebase&lt;/a&gt; itself as core context, asking Antigravity to identify and codify the unique skills needed to &lt;strong&gt;build a production agent on Google Cloud.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For some of the more specialized areas, I provided additional context with patterns I’d like to follow, such as the agent evaluation &lt;a href="https://codelabs.devsite.corp.google.com/codelabs/production-ready-ai-roadshow/2-evaluating-multi-agent-systems/evaluating-multi-agent-systems#0" rel="noopener noreferrer"&gt;codelab&lt;/a&gt; and &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/from-vibe-checks-to-continuous-evaluation-engineering-reliable-ai-agents?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt; and the agent security &lt;a href="https://codelabs.developers.google.com/codelabs/production-ready-ai-roadshow/3-securing-a-multi-agent-system/securing-a-multi-agent-system#0?utm_campaign=CDR_0x91b1edb5_default_b491641592&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;codelab&lt;/a&gt;, both written by my awesome team.&lt;/p&gt;

&lt;p&gt;These 13 skills provide Antigravity (or any developer using them) the crucial toolkit of a Google Cloud Production Engineer. I’m currently finalizing a detailed, step-by-step walkthrough of the dev-signal agent which will be published on the &lt;a href="https://cloud.google.com/blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Blog&lt;/strong&gt;&lt;/a&gt; very soon! (follow me for future updates)&lt;/p&gt;

&lt;p&gt;In the meantime, you don’t have to wait — the full &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal/.agent/skills" rel="noopener noreferrer"&gt;skills&lt;/a&gt; are available for you to explore and leverage in your own projects today.&lt;/p&gt;

&lt;p&gt;Here is the full inventory of the skills:&lt;/p&gt;

&lt;h2&gt;
  
  
  🏗️ Production Agent
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;adk-memory-bank-initializer:&lt;/strong&gt; Long-term state logic with Vertex AI Memory Bank.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;agent-containerizer:&lt;/strong&gt; Mixed-runtime Dockerfiles (Python + Node.js).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cloud-run-agent-architect:&lt;/strong&gt; Least-privilege Terraform for Cloud Run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-production-secret-handler:&lt;/strong&gt; In-memory secret fetching pattern (Secret Manager).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mcp-connector-generator:&lt;/strong&gt; Standardized MCP connection logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📊 Evaluation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-eval-engine-runner:&lt;/strong&gt; Parallel inference and reasoning trace capture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-eval-metric-configurator:&lt;/strong&gt; Setup for Grounding and Tool Use rubrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-golden-dataset-builder:&lt;/strong&gt; Tools for building datasets with reference trajectories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-shadow-deployer:&lt;/strong&gt; “Dark Canary” deployment scripts with revision tagging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-tool-trajectory-evaluator:&lt;/strong&gt; Custom Python metrics for Precision and Recall.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🛡️ Security
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-model-armor-shield:&lt;/strong&gt; Intelligent firewall (Prompt Injection, RAI, Malicious URL filters).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-safety-gatekeeper:&lt;/strong&gt; Python integration pattern (safety_util.py) for sanitizing user inputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gcp-agent-sdp-template-factory:&lt;/strong&gt; Terraform for Sensitive Data Protection (PII/Secret redaction).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By codifying these patterns to production skills, Antigravity can now leverage these automatically in my day to day development. I hope you find these as helpful as I do!&lt;/p&gt;

&lt;h2&gt;
  
  
  Pro tip - self improving skills!
&lt;/h2&gt;

&lt;p&gt;Because these skills were AI-generated, they might not work perfectly for your specific environment on the first try. But that’s actually the best part of working with an agentic IDE. If a skill doesn’t work well for you, don’t just manually fix the code, let the coding agent figure it out. Once it finds the solution, you can ask it to update the corresponding SKILL.md with the learned workflow. This will capture the corrected workflow for the future, ensuring the agent doesn’t repeat the mistake while saving you tokens and time on the next run. Think of these as living documents that actively improve as you build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to get started?&lt;/strong&gt; Clone the &lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/ai-ml/dev-signal" rel="noopener noreferrer"&gt;repository&lt;/a&gt; and add these skills to your Workspace or Global Scope to start building your own production-ready agents. Learn more about &lt;a href="https://antigravity.google/docs/skills" rel="noopener noreferrer"&gt;Agent skills.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Follow me on &lt;a href="https://www.linkedin.com/in/shirmeirlador/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://x.com/shirmeir86?lang=en" rel="noopener noreferrer"&gt;X&lt;/a&gt; for updates on my next blogs and videos.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>ai</category>
      <category>googlecloud</category>
      <category>agents</category>
    </item>
    <item>
      <title>How I Turned an Ugly Spreadsheet into an AI Assisted App with Antigravity</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Wed, 18 Feb 2026 17:39:12 +0000</pubDate>
      <link>https://forem.com/googleai/how-i-turned-an-ugly-spreadsheet-into-an-ai-assisted-app-with-antigravity-3j52</link>
      <guid>https://forem.com/googleai/how-i-turned-an-ugly-spreadsheet-into-an-ai-assisted-app-with-antigravity-3j52</guid>
      <description>&lt;p&gt;&lt;strong&gt;I have a confession to make.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Up until now, I wasn’t that much into “vibe coding.” I used AI all the time for Python coding, but I never really built a whole app from scratch in a language I knew nothing about.&lt;/p&gt;

&lt;p&gt;That changed today. I encountered a really annoying problem: I had to review a massive amount of talk submissions for a conference. We’re talking about a massive spreadsheet. Staring at those tiny cells was literally making my eyes hurt.&lt;/p&gt;

&lt;p&gt;My initial thought was, “Hey, let’s create a really sharp UI for the submission review.” But then I thought, why stop there? Why not let AI provide me valuable inputs from social media to help me with the review itself?&lt;/p&gt;

&lt;p&gt;So, I decided to build &lt;strong&gt;TalkScout&lt;/strong&gt;. And since I wanted to test drive &lt;a href="https://antigravity.google/docs/home" rel="noopener noreferrer"&gt;Google Antigravity&lt;/a&gt; (Google’s new AI-powered coding agent), I figured this was the perfect opportunity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftvcnagk5wbmvmw2dxxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftvcnagk5wbmvmw2dxxt.png" alt="talkscout dashboard" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;Talkscout Dashboard (synthetic data)&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Here is how I went from a painful CSV to a fully deployed &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; app-without writing a single line of React code myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: The “Meta-Prompt” (Asking Gemini to Talk to Antigravity)
&lt;/h2&gt;

&lt;p&gt;I didn’t start by coding; I started by chatting. I used &lt;strong&gt;meta-prompting&lt;/strong&gt; to get started.&lt;/p&gt;

&lt;p&gt;So, what is meta-prompting, you may ask? It’s actually when you go to Gemini 3 and ask it to write the prompt for the coding agent.&lt;/p&gt;

&lt;p&gt;I explained my problem to &lt;strong&gt;Gemini 3&lt;/strong&gt; in simple words. Gemini 3 acted as my architect. It turned my “brain dump” requirements into a technical spec, defining the component structure and data model. I didn’t have to guess the right words, I just pasted that polished spec into Antigravity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Ditching the Spreadsheet for a Dashboard
&lt;/h2&gt;

&lt;p&gt;With that prompt, Antigravity built the app of my dreams. It allowed me to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload the CSV with all the conference talks.&lt;/li&gt;
&lt;li&gt;Get a dashboard showing the status of each talk.&lt;/li&gt;
&lt;li&gt;See a beautiful, high-contrast UI to review abstracts and demo plans without squinting at cells.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv05d3jhgbptocmquud4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv05d3jhgbptocmquud4.png" alt="TalkScout submission review page with high contrast UI" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;TalkScout submission review page with high contrast UI&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;&lt;strong&gt;The “Vibe” Fix:&lt;/strong&gt; It wasn’t all smooth sailing — I actually hit a nasty React hydration error. This can take hours to debug, especially if you’re not a frontend developer… But I simply provided the error message to Antigravity and the coding agent pinpointed the mismatch in the DOM and fixed it in minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Integrating Grounded Intelligence
&lt;/h2&gt;

&lt;p&gt;I didn’t just want a UI; I wanted to overcome my own bias. How do I know if a niche topic is actually hot?&lt;/p&gt;

&lt;p&gt;I added a button to get an &lt;strong&gt;AI Assessment&lt;/strong&gt;. But I didn’t want hallucinations. I used &lt;strong&gt;Google Search Grounding&lt;/strong&gt; so the AI could search through Reddit, X (Twitter), and LinkedIn for real-world developer signals. That provided me inputs based on the current developer audience mindshare.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftck7aytx9ecgnlrifq08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftck7aytx9ecgnlrifq08.png" alt="TalkScout submission review page with AI social media analysis" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;TalkScout submission review page with AI social media analysis&lt;/small&gt;&lt;/center&gt;

&lt;h2&gt;
  
  
  Step 4: Calibrating the “Strict” Reviewer
&lt;/h2&gt;

&lt;p&gt;Initially, the AI was way too nice. It was giving high scores to anything with trendy keywords.&lt;/p&gt;

&lt;p&gt;I used what’s called &lt;strong&gt;few-shot prompting&lt;/strong&gt; to calibrate it. I gave examples of my scores vs. its scores and introduced what I call the &lt;strong&gt;“Marketing Fluff Penalty”&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a submission reads like a documentation/marketing page? Points docked.&lt;/li&gt;
&lt;li&gt;If the submission was way too short? We capped the score at a hard 2.&lt;/li&gt;
&lt;li&gt;If it includes war stories and actual learnings — increase rating.
After a few examples, it became more calibrated to my taste.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: The Pivot to Batch Mode
&lt;/h2&gt;

&lt;p&gt;I realized it was taking me too long to ask the AI to evaluate each talk individually while I reviewed it.&lt;/p&gt;

&lt;p&gt;So, I asked Antigravity to refactor the backend for &lt;strong&gt;Batch Mode&lt;/strong&gt;. Now, TalkScout processes the entire submission pool in the background. By the time I grab a coffee, the “AI Draft” column is full of insights, allowing me to focus only on the final decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Sharing the Goodness (Deploy to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;)
&lt;/h2&gt;

&lt;p&gt;TalkScout was working great for me, but I thought, “It would be great to share this with the other reviewers.”&lt;/p&gt;

&lt;p&gt;This is where Antigravity really showed off. I simply asked it to deploy the app. It automatically recognized my Google Cloud Project ID, handled the containerization, generated the exact deployment commands, and deployed it to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One simple ask, and minutes later, I had a URL to share with the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  It Was Pretty Fun!
&lt;/h2&gt;

&lt;p&gt;It was pretty fun to actually solve a real problem I had using Antigravity and vibe coding. I built a tool that handles ingestion, provides a distraction-free rating interface, and provides valuable inputs for my reviews.&lt;/p&gt;

&lt;p&gt;I would love to hear from you all - have you recently solved a problem using vibe coding?&lt;/p&gt;

&lt;p&gt;If you haven’t already - try playing around with &lt;a href="https://antigravity.google/docs/home" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; and easily deploy your apps to &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x91b1edb5_default_b473111509&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Decoding high-bandwidth memory: A practical guide to GPU memory for fine-tuning AI models</title>
      <dc:creator>Shir Meir Lador</dc:creator>
      <pubDate>Thu, 15 Jan 2026 15:27:00 +0000</pubDate>
      <link>https://forem.com/googleai/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models-56af</link>
      <guid>https://forem.com/googleai/decoding-high-bandwidth-memory-a-practical-guide-to-gpu-memory-for-fine-tuning-ai-models-56af</guid>
      <description>&lt;p&gt;We've all been there. You've meticulously prepared your dataset and written your training script. You hit &lt;strong&gt;run&lt;/strong&gt;, and your excitement builds, only to be crushed by the infamous error: CUDA out of memory.&lt;/p&gt;

&lt;p&gt;This is one of the most common roadblocks in AI development. Your GPU's &lt;a href="https://en.wikipedia.org/wiki/High_Bandwidth_Memory" rel="noopener noreferrer"&gt;High Bandwidth Memory (HBM)&lt;/a&gt;, is the high-speed memory that holds everything that's needed for computation, and running out of it is a hard stop. But how do you know how much you need?&lt;/p&gt;

&lt;p&gt;To build a clear foundation, we'll start by breaking down the HBM consumers on a single GPU and we'll present key strategies to reduce HBM consumption on a single GPU. Later, we'll explore advanced multi-GPU strategies like data and &lt;a href="https://huggingface.co/docs/transformers/v4.13.0/en/parallelism" rel="noopener noreferrer"&gt;model parallelism&lt;/a&gt; that can help relieve memory pressure and scale your training in the cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding HBM: What's using all the memory?
&lt;/h2&gt;

&lt;p&gt;When you fine-tune a model, your HBM is primarily consumed by three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.webopedia.com/technology/llm-tokens-weights-parameters/#:~:text=in%20various%20contexts.-,What%20are%20LLM%20Weights?,or%20generate%20coherent%2C%20meaningful%20responses." rel="noopener noreferrer"&gt;Model Weights&lt;/a&gt;:&lt;/strong&gt; This is the most straightforward. It's the storage space required for the model's parameters—the "brain" that it uses to make predictions. A 7-billion parameter model loaded in 16-bit precision will take up roughly 14 GB before you even process a single piece of data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://eureka.patsnap.com/article/what-is-the-optimizer-state-in-deep-learning-training" rel="noopener noreferrer"&gt;Optimizer States&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Gradient_descent" rel="noopener noreferrer"&gt;Gradients&lt;/a&gt;:&lt;/strong&gt; This is the overhead that's required for learning. To update the model's weights, the training process needs to calculate gradients (the direction of learning) and the &lt;a href="https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/#Adam_Deep_Learning_Optimizer" rel="noopener noreferrer"&gt;optimizer&lt;/a&gt; (like the popular &lt;a href="https://docs.pytorch.org/docs/stable/generated/torch.optim.AdamW.html" rel="noopener noreferrer"&gt;AdamW&lt;/a&gt;) needs to store its own data to guide the training. In full fine-tuning, this can be the largest consumer of HBM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Activation_function" rel="noopener noreferrer"&gt;Activations&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Online_machine_learning#Batch_learning" rel="noopener noreferrer"&gt;Batch Data&lt;/a&gt;:&lt;/strong&gt; This is the most dynamic part. When your data (images, text, etc.) flows through the model's layers, the intermediate calculations, or activations, are stored in HBM. The memory needed here is directly proportional to your batch size. A larger batch size means more activations are stored simultaneously, which leads to faster training but much higher memory usage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; These calculations are theoretical minimums. Real-world frameworks add up to 30% overhead due to &lt;a href="https://arxiv.org/abs/1910.02054" rel="noopener noreferrer"&gt;temporary buffers, kernel launches, and memory fragmentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although it's impossible to get a perfect number without experimentation, you can estimate your HBM needs with this general formula:&lt;br&gt;
&lt;em&gt;&lt;center&gt;Total HBM ≈ (Model Size) + (Optimizer States) + (Gradients) + (Activations)&lt;/center&gt;&lt;/em&gt;&lt;br&gt;
 &lt;br&gt;
&lt;strong&gt;Further reading:&lt;/strong&gt; See this excellent JAX e-book that covers &lt;a href="https://jax-ml.github.io/scaling-book/gpus/" rel="noopener noreferrer"&gt;these topics&lt;/a&gt; in great detail and even has some &lt;a href="https://jax-ml.github.io/scaling-book/gpus/#quiz-5-llm-rooflines" rel="noopener noreferrer"&gt;"try it out yourself" test questions&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Why full fine-tuning is so demanding
&lt;/h2&gt;

&lt;p&gt;To see why running out of memory is such a common problem, let's walk through a real-world example that I recently worked on: fine-tuning the &lt;a href="https://deepmind.google/models/gemma/medgemma/" rel="noopener noreferrer"&gt;medgemma-4b-it model&lt;/a&gt;, which has 4 billion parameters. Our &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/a-step-by-step-guide-to-fine-tuning-medgemma-for-breast-tumor-classification" rel="noopener noreferrer"&gt;script&lt;/a&gt; loads it in bfloat16 precision (2 bytes per parameter).&lt;/p&gt;

&lt;p&gt;First, let's calculate the static HBM footprint. This is the memory that's required just to load the model and prepare it for training, before you've even processed a single piece of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Model Size:&lt;/strong&gt; The memory that's needed to simply hold the model on the GPU.&lt;/p&gt;

&lt;center&gt;4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Gradients and Optimizer States:&lt;/strong&gt; The overhead for training every parameter with the AdamW optimizer.&lt;/p&gt;

&lt;center&gt;Gradients: 4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;center&gt;Optimizer States (AdamW): 2 × 4 billion parameters × 2 bytes/parameter = 16 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; While AdamW is a popular optimizer, other optimizers, such as Adafactor and Lion, have different memory footprints.&lt;/p&gt;

&lt;p&gt;Adding these together gives us our baseline HBM cost for a full fine-tuning attempt:&lt;/p&gt;

&lt;center&gt;8 GB (Model) + 8 GB (Gradients) + 16 GB (Optimizer) = 32 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;This 32 GB is the baseline just to start the training process. On top of this, the GPU needs &lt;strong&gt;additional memory for activations&lt;/strong&gt;, which is a &lt;em&gt;dynamic&lt;/em&gt; cost that grows with your batch size and input data size. This is why full fine-tuning of large models is so demanding and often reserved for the most powerful hardware.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key strategies to reduce HBM consumption
&lt;/h2&gt;

&lt;p&gt;The HBM requirement for a full fine-tune can seem impossibly high. But several powerful techniques can reduce memory consumption, making it feasible to train large models on consumer-grade or entry-level professional GPUs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Parameter-Efficient Fine-Tuning (PEFT) with LoRA
&lt;/h3&gt;

&lt;p&gt;Instead of training all the billions of parameters in a model, &lt;a href="https://huggingface.co/docs/peft/en/index" rel="noopener noreferrer"&gt;Parameter-Efficient Fine-Tuning (PEFT)&lt;/a&gt; methods focus on training only a small subset of parameters. The most popular of these is &lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;LoRA (Low-Rank Adaptation)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/lora-qlora?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;LoRA&lt;/a&gt; works by freezing &lt;strong&gt;the original model's weights and injecting a tiny number of new, trainable &lt;em&gt;adapter&lt;/em&gt; layers&lt;/strong&gt; into the model architecture. This means the memory-hungry gradients and optimizer states are only needed for these few million new parameters, not the full 4 billion.&lt;/p&gt;
&lt;h4&gt;
  
  
  The math behind LoRA's memory savings
&lt;/h4&gt;

&lt;p&gt;LoRA doesn't remove the base model from your GPU. The full 8 GB of the original model's weights are still loaded and taking up HBM. They're just frozen, which means that the GPU isn't training them. All of the memory savings come from the fact that you no longer need to store the huge gradients and optimizer states for that massive, frozen part of the model.&lt;/p&gt;

&lt;p&gt;Let's recalculate the static HBM footprint with LoRA, assuming it adds 20 million trainable parameters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Model Size (unchanged):&lt;/strong&gt; The base model is still loaded.&lt;/p&gt;

&lt;center&gt;4 billion parameters × 2 bytes/parameter = 8 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. LoRA Gradients &amp;amp; Optimizer States:&lt;/strong&gt; We now only need overhead for the tiny set of new parameters.&lt;/p&gt;

&lt;center&gt;Gradients: 20 million parameters × 2 bytes/parameter = 40 MB&lt;/center&gt;

&lt;center&gt;
Optimizer States: 2 × 20 million parameters × 2 bytes/parameter = 80 MB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;The new static HBM footprint is now:&lt;/p&gt;

&lt;center&gt;8 GB (Model) + 40 MB (Gradients) + 80 MB (Optimizer) ≈ 8.12 GB&lt;/center&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;The training overhead has shrunk from 24 GB to just 120 MB. Your new baseline memory requirement is now just over 8 GB. This lower baseline memory requirement leaves much more room for the dynamic memory that's needed for activations, which lets you use a reasonable batch size on a common 16 GB or 24 GB GPU without running out of memory.&lt;/p&gt;
&lt;h3&gt;
  
  
  Model quantization
&lt;/h3&gt;

&lt;p&gt;Besides training fewer parameters, we can also shrink the ones that we have by using &lt;a href="https://huggingface.co/docs/optimum/en/concept_guides/quantization" rel="noopener noreferrer"&gt;quantization&lt;/a&gt;, which involves reducing the &lt;a href="https://arxiv.org/html/2410.13857v1" rel="noopener noreferrer"&gt;numerical precision&lt;/a&gt; of the model's weights. The standard precision for modern training is &lt;a href="https://en.wikipedia.org/wiki/Bfloat16_floating-point_format" rel="noopener noreferrer"&gt;bfloat16&lt;/a&gt; because it offers the dynamic range of float32 with half the memory footprint. But we can reduce HBM usage further by converting weights to lower-precision integer formats like int8 or int4.&lt;/p&gt;

&lt;p&gt;Using lower-precision integer formats has a significant impact on HBM when compared to the standard bfloat16 baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;bfloat16 (standard):&lt;/strong&gt; The baseline size (e.g., a 7B model requires &lt;strong&gt;~14 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8-bit precision:&lt;/strong&gt; Halves the model size (e.g., 14 GB becomes &lt;strong&gt;~7 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4-bit precision:&lt;/strong&gt; Reduces the model size by a factor of 4 (e.g., 14 GB becomes &lt;strong&gt;~3.5 GB&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reduction in size lets you fit much larger models into memory with minimal degradation in performance.&lt;/p&gt;


&lt;div class="crayons-card c-embed"&gt;

  

&lt;p&gt;&lt;strong&gt;A word of warning from experience:&lt;/strong&gt;&lt;br&gt;
When I started experimenting in this area, my first attempt to load the model using the common float16 data type failed spectacularly. The model's outputs were NaN, and a quick check revealed that every internal value had collapsed into NaN (Not a Number) .&lt;/p&gt;

&lt;p&gt;The culprit was a classic &lt;a href="https://en.wikipedia.org/wiki/Integer_overflow" rel="noopener noreferrer"&gt;numerical overflow&lt;/a&gt;. The float16 data type has a tiny numerical range and it can't represent any number larger than 65,504. During training, intermediate values can easily exceed this limit, causing an overflow that creates a NaN. The fix was a simple one-line change to bfloat16, which has a massive numerical range that prevents these overflows and keeps training stable. For fine-tuning large models, always prefer bfloat16 for stability.&lt;/p&gt;


&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;&lt;a href="https://arxiv.org/abs/2305.14314" rel="noopener noreferrer"&gt;Combining LoRA and Quantization:&lt;/a&gt;&lt;/strong&gt; These techniques work best together. Quantized LoRA (QLoRA) is a method that stores the massive base model in a highly efficient 4-bit format (specifically NF4 or NormalFloat 4), while adding small, trainable LoRA adapters in bfloat16. During the training process, the 4-bit weights are dequantized to bfloat16 for computation. Dequantizing in process lets you fine-tune very large models on a single GPU with the memory savings of 4-bit storage and the mathematical stability of 16-bit training.&lt;/p&gt;

&lt;h3&gt;
  
  
  FlashAttention: An algorithmic speed boost
&lt;/h3&gt;

&lt;p&gt;Finally, &lt;a href="https://arxiv.org/abs/2205.14135" rel="noopener noreferrer"&gt;FlashAttention&lt;/a&gt; is a foundational algorithmic optimization that significantly reduces HBM usage and speeds up training on both single and multi-GPU setups. The attention mechanism in transformers is a primary memory bottleneck because it requires storing a large, intermediate &lt;a href="https://en.wikipedia.org/wiki/Attention_%28machine_learning%29" rel="noopener noreferrer"&gt;attention matrix&lt;/a&gt;. FlashAttention cleverly reorders the computation to avoid storing this full matrix in memory, leading to substantial memory savings and faster execution.&lt;/p&gt;

&lt;p&gt;Best of all, enabling FlashAttention is often as simple as a one-line change. In the MedGemma fine-tuning script, this was done by setting the value &lt;code&gt;attn_implementation="sdpa"&lt;/code&gt;, which can automatically use more efficient backends like FlashAttention if the hardware supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling beyond a single GPU: Advanced strategies
&lt;/h2&gt;

&lt;p&gt;Techniques like LoRA and quantization are useful for lowering HBM needs on a single GPU. But to train truly massive models or to really speed up the process, you'll eventually need to scale out to multiple GPUs. Here are some of the key strategies that can be used to distribute the load and overcome memory limitations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data parallelism
&lt;/h3&gt;

&lt;p&gt;Data parallelism is the most common and intuitive approach to scaling. In a Distributed Data Parallel (DDP) setup, the entire model is replicated on each GPU. The key is that the global batch of training data is split, with each GPU processing its own mini-batch concurrently. After each forward and backward pass, the gradients from each GPU are averaged together to ensure that all of the model replicas learn from the entire dataset and they stay in sync. This method is excellent for &lt;strong&gt;speeding up training&lt;/strong&gt; but it &lt;strong&gt;doesn't reduce the HBM&lt;/strong&gt; that's required to hold the model itself, because every GPU needs a full copy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model parallelism
&lt;/h3&gt;

&lt;p&gt;When a model is too large to fit into the memory of a single GPU, you must use &lt;a href="https://en.wikipedia.org/wiki/Data_parallelism#Data_parallelism_vs._model_parallelism" rel="noopener noreferrer"&gt;model parallelism&lt;/a&gt;. Instead of replicating the model, this strategy &lt;strong&gt;splits the model&lt;/strong&gt; across multiple GPUs. There are two primary ways to do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://huggingface.co/docs/text-generation-inference/en/conceptual/tensor_parallelism" rel="noopener noreferrer"&gt;Tensor parallelism&lt;/a&gt;:&lt;/strong&gt; This method splits a single large operation (like a massive weight matrix in a transformer layer) across several GPUs. Each GPU computes its part of the operation, and the results are combined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.pytorch.org/docs/stable/distributed.pipelining.html" rel="noopener noreferrer"&gt;Pipeline parallelism&lt;/a&gt;:&lt;/strong&gt; This technique places different layers of the model onto different GPUs in a sequence. The data flows through the first set of layers on GPU 1, then the output is passed to GPU 2 for the next set of layers, and so on, like an assembly line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These strategies are more complex to implement than data parallelism, but they're essential for models that are simply too big for one device.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Sharded Data Parallelism (FSDP)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html" rel="noopener noreferrer"&gt;FSDP&lt;/a&gt; is a powerful and efficient hybrid strategy that combines the ideas of &lt;strong&gt;data parallelism&lt;/strong&gt; and &lt;strong&gt;model parallelism&lt;/strong&gt;. Unlike standard data parallelism where each GPU holds a full copy of the model, optimizer states, and gradients, FSDP shards (or splits) all of these components across the GPUs. Each GPU only materializes the full parameters for the &lt;strong&gt;specific layer&lt;/strong&gt; that it's computing at that moment, &lt;strong&gt;dramatically reducing the peak HBM&lt;/strong&gt; usage per device. FSDP makes it possible to train enormous models on a cluster of smaller GPUs.&lt;/p&gt;

&lt;p&gt;By combining these hardware and software strategies, you can &lt;strong&gt;scale your fine-tuning jobs&lt;/strong&gt; from a single GPU to a &lt;strong&gt;powerful, distributed cluster&lt;/strong&gt; capable of handling even the most demanding AI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  HBM sizing guide
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HBM&lt;/th&gt;
&lt;th&gt;Use case and explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Sufficient for basic inference or fine-tuning with techniques like LoRA using a very small batch size (e.g., 1-2). Expect slower training times at this level.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;The recommended starting point for a good experience with 4-7 B parameter models. This capacity allows for a more effective batch size (e.g., 8-16) when using LoRA, providing a great balance of training speed and cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40+ GB&lt;/td&gt;
&lt;td&gt;Necessary for maximizing training speed with large batch sizes or for working with larger models (in the 20+ B parameter range) now or in the future.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Encountering the CUDA out of memory error provides an important lesson in the trade-offs between model size, training techniques, and batch size. By understanding what consumes your HBM, you can make smarter decisions and keep your projects running smoothly.&lt;/p&gt;

&lt;p&gt;I hope that this guide has helped clarify the CUDA out of memory error and that it's given you the tools to keep your projects running smoothly. When you're ready to take the next step, Google Cloud has the tools to accelerate your AI development.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore &lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GPU configurations for your Cloud Run services&lt;/a&gt; and best practices for running &lt;a href="https://cloud.google.com/run/docs/configuring/jobs/gpu-best-practices?hl=en&amp;amp;utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run jobs with GPU&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For maximum control: Spin up a &lt;a href="https://cloud.google.com/products/compute" rel="noopener noreferrer"&gt;Compute Engine&lt;/a&gt; instance with the latest NVIDIA H100 or A100 Tensor Core GPUs and take full control of your environment.&lt;/li&gt;
&lt;li&gt;Looking to optimize your model hosting infrastructure? Take a look at &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/vllm-performance-tuning-the-ultimate-guide-to-xpu-inference-configuration?utm_campaign=CDR_0x91b1edb5_default_b451009911&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;The Ultimate Guide to xPU Inference Configuration&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For a deeper dive into scaling your model, check out &lt;a href="https://jax-ml.github.io/scaling-book" rel="noopener noreferrer"&gt;How to Scale Your Model&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;New to Google Cloud? Get started with the $300 free credit to find the perfect solution for your next project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to Jason Monden and Sayce Falk from the AI compute team for their helpful review and feedback on this post.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gpu</category>
      <category>performance</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
