<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Fabricio Quagliariello</title>
    <description>The latest articles on Forem by Fabricio Quagliariello (@fmquaglia).</description>
    <link>https://forem.com/fmquaglia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F111468%2F9461ba86-307e-4fb0-acca-a512f3f4d6f7.png</url>
      <title>Forem: Fabricio Quagliariello</title>
      <link>https://forem.com/fmquaglia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/fmquaglia"/>
    <language>en</language>
    <item>
      <title>The OptiPFair Series #1: Forging the Future with Small Models — An Architectural Analysis with Pere Martra</title>
      <dc:creator>Fabricio Quagliariello</dc:creator>
      <pubDate>Tue, 16 Dec 2025 10:49:16 +0000</pubDate>
      <link>https://forem.com/fmquaglia/the-optipfair-series-1-forging-the-future-with-small-models-an-architectural-analysis-with-pere-4lge</link>
      <guid>https://forem.com/fmquaglia/the-optipfair-series-1-forging-the-future-with-small-models-an-architectural-analysis-with-pere-4lge</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Originally published on &lt;a&gt;Principia Agentica&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This is the first episode of The OptiPFair Series, a deep-dive exploration of Small Language Models optimization.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI race has prioritized parameter count, but for real-world systems, the equation has changed.&lt;br&gt;
We've entered the efficiency era. In this first OptiPFair Series episode, I speak with Pere Martra—engineer, educator, and OptiPFair creator—to dissect his tool and its philosophy.&lt;br&gt;
From depth vs. width pruning to surgical bias removal, this architect-to-architect conversation explores building the next generation of Small Language Models. The future belongs to specialists: small, fast, and fair.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: When "Bigger" Stopped Being "Better"
&lt;/h2&gt;

&lt;p&gt;We live in the age of giants—and perhaps we're witnessing their fall?&lt;/p&gt;

&lt;p&gt;Over the past few years, the AI race has been defined by a brutal metric: the number of parameters. Bigger seemed, invariably, better. But for those of us building systems in the real world—those who have to deal with cloud budgets, real-time latency, and edge devices—the equation has changed.&lt;/p&gt;

&lt;p&gt;We've entered the age of &lt;strong&gt;efficiency&lt;/strong&gt;. The rise of &lt;em&gt;Small Language Models&lt;/em&gt; (SLMs) isn't a passing fad; it's a necessary market correction. But how do we take these models and make them even faster, lighter, and fairer without destroying their intelligence in the process?&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Pere Martra&lt;/strong&gt; and his new creation come in: &lt;strong&gt;OptiPFair&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Pere isn't an ivory tower academic. He's a seasoned engineer, a prolific educator (his LLM course repository is a must-read reference that I highly recommend), and above all, a pragmatic builder. I had the privilege of sitting down with him to dissect not just his tool, but the philosophy behind it.&lt;/p&gt;

&lt;p&gt;What follows isn't a simple interview; it's a deep dive into the mind of an architect who is defining how we'll build the next generation of efficient AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act I: The Pragmatic Spark and the Secret of Productivity
&lt;/h2&gt;

&lt;p&gt;The first thing I wanted to know was the origin. We often imagine that open-source libraries are born from grand theoretical epiphanies. Pere's story, however, is refreshingly human and pragmatic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Fabricio Q:&lt;/strong&gt; Pere, OptiPFair is a sophisticated tool. What was the specific pain point or "spark" that led you to say "I need to build this"?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; Well, it came from a technical test. They asked me to create an optimized version of a model and I decided to do &lt;em&gt;pruning&lt;/em&gt;. From that test, I started researching, and over these months, SLMs have been gaining more importance and different papers have been appearing that I've based my work on. The most important one was from Nvidia that explained how they created their model families using &lt;em&gt;structured pruning&lt;/em&gt; plus &lt;em&gt;knowledge distillation&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Architect's Analysis:&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This answer reveals two fundamental truths about good engineering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Innovation is born from necessity:&lt;/strong&gt; OptiPFair wasn't born looking for a problem; it was born solving one. That's the best guarantee of usefulness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Curiosity as a driver:&lt;/strong&gt; Pere didn't just pass the technical test. He used that challenge as a springboard to investigate the state of the art (Nvidia papers) and democratize that complex technology into an accessible tool.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But there's something deeper in Pere's way of working. When I asked him how he manages to maintain such high output—books, courses, libraries, private work—he revealed his personal "algorithm" for productivity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; "I try to leverage everything I do; everything I do has at least two uses. OptiPFair came from a commission... from that problem came a notebook for my course, and from that notebook came the library.When I do development, depending on how rushed I am: I can start with a notebook that goes to the course and from the notebook it moves to the library, or I go straight to the library to solve what needs to be known in the project and then, when I have time, that moves toward educational notebooks."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The Takeaway:&lt;/strong&gt; For Pere, code is never an end in itself. It's a vehicle. &lt;strong&gt;OptiPFair&lt;/strong&gt; is the crystallization of his knowledge, packaged so others can use it (&lt;em&gt;the library&lt;/em&gt;) and understand it (&lt;em&gt;the book and the course&lt;/em&gt;). It's the perfect cycle of learning and teaching.&lt;/p&gt;




&lt;h1&gt;
  
  
  Act II: The Architectural "Sweet Spot" and the Ethics of Code
&lt;/h1&gt;

&lt;p&gt;Once the origin was understood, it was time to talk architecture. The optimization ecosystem is full of noise. There are a thousand ways to make a model smaller (quantization, distillation, unstructured pruning). I asked Pere where exactly OptiPFair fits. His answer was a lesson in &lt;strong&gt;knowing your terrain&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; "OptiPFair doesn't compete in the 70B parameter range. Its 'sweet spot' is sub-13B models, and specifically, deployment efficiency through &lt;strong&gt;Depth Pruning&lt;/strong&gt;.Many width pruning methods theoretically reduce parameters, but often fail to improve actual inference speed in small batch scenarios (like local devices), because they break the memory alignment that GPUs love. By removing complete transformer blocks (&lt;em&gt;depth pruning&lt;/em&gt;), we achieve hardware-agnostic acceleration."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From the Principia Agentica Laboratory: The Acid Test&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Inspired by this distinction, I decided not to stay in theory. I took OptiPFair to my own lab to test this premise with a 90-minute "Hello, Speedup" recipe.&lt;/p&gt;

&lt;p&gt;Using a &lt;code&gt;Llama-3.2-1B&lt;/code&gt; model as baseline, I ran two strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Width Pruning (MLP_GLU):&lt;/strong&gt; Reducing fine neurons.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth Pruning:&lt;/strong&gt; Eliminating the last 3 layers of the model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviwdvl9au90ylg0pi2sx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviwdvl9au90ylg0pi2sx.png" alt="Depth vs Width Pruning Speed" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Laboratory Verdict:&lt;/strong&gt; The results validated Pere's thesis. While width pruning maintained the global structure more faithfully, &lt;strong&gt;depth pruning delivered a significantly larger performance gain&lt;/strong&gt;: a 15.6% improvement in Tokens Per Second (TPS) compared to width pruning's 4.3%, with controllable quality degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reproduce these results experimentally&lt;/strong&gt;: All benchmarks are documented in an interactive Jupyter notebook. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://colab.research.google.com/github/fmquaglia/principia-agentica/blob/master/articles/091925-memory-in-agents/optipfair_series_1.ipynb" rel="noopener noreferrer"&gt;Open in Colab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/fmquaglia/principia-agentica/blob/master/articles/091925-memory-in-agents/optipfair_series_1.ipynb" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Visualizing the Invisible: Bias&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;But speed isn't everything. And this is where OptiPFair plays its hidden card. Pere showed me a demo that left me frozen. It wasn't about TPS, it was about &lt;strong&gt;ethics&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; "It's not enough to make the model fast. We need to know if pruning it amplifies biases. OptiPFair includes a bias visualization module that analyzes how layers activate in response to protected attributes."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He shared an example with a recent &lt;code&gt;Llama-3.2&lt;/code&gt; model. Given a prompt about a Black man in an ambiguous situation, the original model hallucinated a violent response (a shooting). After surgical intervention using OptiPFair's analysis tools—removing just 0.1% of specific neurons—the model changed its response: the police officer no longer shot, but called for help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architect's Analysis:&lt;/strong&gt; This is a game-changer. Normally, we treat "ethics" and "optimization" as separate silos. Pere has integrated them into the same toolbox. He reminds us that an "efficient" model that amplifies prejudices isn't production-ready; it's a liability risk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act III: "We're Going to Run Out of Planet" and the Master's Advice
&lt;/h2&gt;

&lt;p&gt;Toward the end of our conversation, the discussion turned to the future. I asked Pere where he thinks all this is going. His answer was a sobering reminder of why efficiency isn't just a cost issue, but a sustainability one.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; "If for every specific need we use a 700 billion parameter model... we're going to run out of planet in five years. We need generalist models, yes, but the future belongs to specialists: small models, fast and consuming less."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This vision drives OptiPFair's &lt;em&gt;roadmap&lt;/em&gt;. It doesn't stop here. Pere is already working on &lt;strong&gt;Knowledge Distillation&lt;/strong&gt; and &lt;strong&gt;attention layer pruning&lt;/strong&gt;, seeking that holy grail where a small model doesn't just mimic a large one, but competes with it in its niche.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Deep Dive: Notes for the Advanced Architect&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before closing, I took the opportunity to ask Pere some "architect to architect" questions about the technical limits of these techniques. Here are the key &lt;em&gt;insights&lt;/em&gt; for those who want to take this to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Is there a "safe" pruning range?&lt;/strong&gt; It depends drastically on the family. "Llama handles MLP layer pruning very well (up to 400% of original expansion), while families like Gemma are more fragile. The safe limit usually hovers around 140% remaining expansion, but it will almost always require a recovery process (retraining or distillation)".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Last" layers heuristic:&lt;/strong&gt; Although depth pruning often targets the last layers, Pere clarified that this is an oversimplification. The recommended practice is to protect the first 4 blocks (fundamental for input processing) and the last 2 (essential for output consolidation). The "fat" is usually in the middle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Final Advice: Top to Bottom&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To finish, I asked for advice for engineers who are starting out in this dizzying field. His answer validates the path many of us are taking.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pere Martra:&lt;/strong&gt; "Don't get bored. Study from top to bottom. Start using an API, doing something easy that you like. Once you have it, go down. Go to the foundations. Understand how a Transformer works, what a GLU structure is. Those 'aha!' moments when you connect practice with theory are what make you an expert."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Conclusion: The Lighthouse Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;OptiPFair&lt;/strong&gt; isn't just another library in the Python ocean. It's a statement of principles.&lt;/p&gt;

&lt;p&gt;For the modern AI architect, it represents the perfect tool for the "Edge AI" and efficiency era. If your goal is to deploy language models in constrained environments, controlling both latency and ethical bias, this is an essential piece in your toolbelt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I take away from Pere&lt;/strong&gt;: The most sophisticated technology is born from the simplest pragmatism. You don't need to start with a grand theory; you need to start solving a real problem. And if in the process you can teach others and build tools that make work fairer and more efficient, then you're building a legacy.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;principia-agentica&lt;/code&gt; laboratory approves and recommends &lt;strong&gt;OptiPFair&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I want to use OptiPFair. Where do I start?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Official OptiPFair repository:&lt;/strong&gt; &lt;a href="http://github.com/peremartra/optipfair" rel="noopener noreferrer"&gt;github.com/peremartra/optipfair&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pere's complete LLM course (free):&lt;/strong&gt; An educational treasure that covers from fundamentals to advanced techniques. Highly recommended. &lt;a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course" rel="noopener noreferrer"&gt;https://github.com/peremartra/Large-Language-Model-Notebooks-Course&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Large Language Models Projects" book (Apress, 2024):&lt;/strong&gt; Pere's definitive guide on LLMs, now available. &lt;a href="https://link.springer.com/book/10.1007/979-8-8688-0515-8" rel="noopener noreferrer"&gt;https://link.springer.com/book/10.1007/979-8-8688-0515-8&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upcoming book with Manning:&lt;/strong&gt; Pere is working on a book about model architecture and optimization that will delve deeper into OptiPFair and related techniques. Stay tuned.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Connect with Pere Martra:&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; Follow his updates on OptiPFair, SLMs, and the future of efficient AI

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/pere-martra/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/pere-martra/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Hugging Face:&lt;/strong&gt; Explore his optimized models and experiments with SLMs

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/oopere" rel="noopener noreferrer"&gt;https://huggingface.co/oopere&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Medium:&lt;/strong&gt; Read his articles on model optimization and advanced ML techniques

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/@peremartra" rel="noopener noreferrer"&gt;https://medium.com/@peremartra&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Community:&lt;/strong&gt; Pere is an active mentor at &lt;a href="https://community.deeplearning.ai/u/pere_martra/summary" rel="noopener noreferrer"&gt;DeepLearning.AI&lt;/a&gt; and regularly contributes to &lt;a href="https://towardsai.net/?s=pere%20martra" rel="noopener noreferrer"&gt;TowardsAI&lt;/a&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you found this article useful:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://peremartra.github.io/optipfair/" rel="noopener noreferrer"&gt;Try OptiPFair&lt;/a&gt; in your next optimization project&lt;/li&gt;
&lt;li&gt;Share this analysis with your ML team&lt;/li&gt;
&lt;li&gt;Consider supporting Pere's open source work by giving it a star on GitHub&lt;/li&gt;
&lt;li&gt;Follow Principia Agentica's work for more in-depth architectural analyses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Efficiency isn't just a technical metric. It's a commitment to a sustainable future for AI. Pere Martra is leading that path, one line of code at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editor's Note (December 2025)&lt;/strong&gt;: While this article was being prepared for publication, Pere released significant improvements to OptiPFair that address precisely the memory alignment limitation mentioned.&lt;br&gt;&lt;br&gt;
Now &lt;code&gt;width pruning&lt;/code&gt; supports the &lt;code&gt;expansion_divisor&lt;/code&gt; parameter (32, 64, 128, 256) to respect tensor core size, and accepts a &lt;code&gt;dataloader&lt;/code&gt; for data-driven neuron selection. This demonstrates the speed of OptiPFair's evolution.&lt;br&gt;
A complete update will come in the OptiPFair Series from Principia Agentica.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;More from Principia Agentica:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Follow the series and explore hands-on labs, architectural analyses, and AI agent deep-dives at &lt;a href="https://principia-agentica.io" rel="noopener noreferrer"&gt;principia-agentica.io&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Beyond the Notebook: 4 Architectural Patterns for Production-Ready AI Agents</title>
      <dc:creator>Fabricio Quagliariello</dc:creator>
      <pubDate>Wed, 10 Dec 2025 21:57:39 +0000</pubDate>
      <link>https://forem.com/fmquaglia/beyond-the-notebook-4-architectural-patterns-for-production-ready-ai-agents-3a16</link>
      <guid>https://forem.com/fmquaglia/beyond-the-notebook-4-architectural-patterns-for-production-ready-ai-agents-3a16</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-kaggle-ai-agents-2025-11-10"&gt;Google AI Agents Writing Challenge&lt;/a&gt;: Learning Reflections&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The gap between a "Hello World" agent running in a Jupyter Notebook and a reliable, production-grade system is not a step—it's a chasm (and it is not an easy one to cross).&lt;/p&gt;

&lt;p&gt;I recently had the privilege to participate in the &lt;strong&gt;5-Day AI Agents Intensive Course with Google and Kaggle.&lt;/strong&gt; After completing the coursework and finalizing the capstone project, I realized that beyond the many valuable things we enjoyed in this course (very valuable white papers, carefully designed notebooks, and exceptional expert panels in the live sessions), the real treasure wasn't just learning the ADK syntax—it was the &lt;strong&gt;architectural patterns&lt;/strong&gt; subtly embedded within the lessons. &lt;/p&gt;

&lt;p&gt;As an architect building production systems for over 20 years, including multi-agent workflows and enterprise integrations, I've seen firsthand where theoretical agents break under real-world constraints.&lt;/p&gt;

&lt;p&gt;We are moving from an era of "prompt engineering" to "agent architecture" where "context engineering" is key. As with any other emerging architectural paradigm, this shift demands blueprints that ensure reliability, efficiency, and ethical safety. Without them, we risk agents that silently degrade, violate user privacy, or execute irreversible actions without oversight.&lt;/p&gt;

&lt;p&gt;Drawing from the course and my own experience as an AI Architect, I have distilled the curriculum into four essential patterns that transform fragile prototypes into robust production systems:&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4 Core Patterns
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Outside-In Evaluation Hierarchy:&lt;/strong&gt; Shifting focus from the final answer to the decision-making trajectory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual-Layer Memory Architecture:&lt;/strong&gt; Balancing ephemeral session context with persistent, consolidated knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol-First Interoperability:&lt;/strong&gt; Decoupling agents from tools using standardized protocols like MCP and A2A.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-Running Operations &amp;amp; Resumability:&lt;/strong&gt; Managing state for asynchronous tasks and human-in-the-loop workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Throughout this analysis, I'll apply a &lt;strong&gt;6-point framework&lt;/strong&gt; grounded in the principles of &lt;strong&gt;Principia Agentica&lt;/strong&gt;—ensuring these patterns respect human sovereignty, fiduciary responsibility, and meaningful human control.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Analysis Framework
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Production Problem:&lt;/strong&gt; Why naive approaches fail at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Architectural Solution:&lt;/strong&gt; The specific design pattern taught in the course.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key Implementation Details:&lt;/strong&gt; Concrete code-level insights from the ADK notebooks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Considerations:&lt;/strong&gt; Real-world deployment implications (latency, cost, scale).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection to Ethical Design:&lt;/strong&gt; How the pattern supports human sovereignty, fiduciary responsibility, or ethical agent architecture. I will include a "failure scenario" where I'll try to illustrate what could happen without the ethical safeguard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key Takeaways:&lt;/strong&gt; A distilled summary of each pattern's production principle, implementation guidance, and ethical anchor—designed to serve as a quick reference for architects moving from prototype to production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's do this!&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Outside-In Evaluation Hierarchy (Trajectory as Truth)
&lt;/h2&gt;

&lt;p&gt;In traditional software, if the output is correct, the test passes. In agentic AI, a correct answer derived from a hallucination or a dangerous logic path is a ticking time bomb.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Production Problem
&lt;/h3&gt;

&lt;p&gt;Naive evaluation strategies often fail in production due to the non-deterministic nature of LLMs. We face two specific traps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The "Lucky Guess" Trap:&lt;/strong&gt; Imagine an agent asked to "Get the weather in Tokyo." A bad agent might hallucinate "It is sunny in Tokyo" without calling the weather tool. If it happens to be sunny, a traditional &lt;code&gt;assert result == expected&lt;/code&gt; test passes. This hides a critical failure in logic that will break as soon as the weather changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Silent Failure" of Efficiency:&lt;/strong&gt; An agent might solve a user request but take 25 steps to do what should have taken 3. This bloats token costs and latency—a failure mode that boolean output checks completely miss.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The Architectural Solution
&lt;/h3&gt;

&lt;p&gt;Day 4 of the course introduced the concept of &lt;strong&gt;Glass Box Evaluation&lt;/strong&gt;. We move away from simple output verification to a hierarchical approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Level 1: Black Box (End-to-End):&lt;/strong&gt; Did the user get the right result?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Level 2: Glass Box (Trajectory):&lt;/strong&gt; Did the agent use the correct tools in the correct order?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Level 3: Component (Unit):&lt;/strong&gt; Did the individual tools perform as expected?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This shift treats the &lt;strong&gt;trajectory&lt;/strong&gt; (Thought → Action → Observation) as the unit of truth. By evaluating the trajectory, we ensure the agent isn't just "getting lucky," but is actually &lt;em&gt;reasoning&lt;/em&gt; correctly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkltnk27rcr2v6gsp5xd9.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkltnk27rcr2v6gsp5xd9.webp" alt="pattern1_1" width="800" height="784"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Implementation Details: Field Notes from the ADK
&lt;/h3&gt;

&lt;p&gt;The ADK provides specific primitives to capture and score these trajectories without writing custom parsers for every test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From &lt;code&gt;adk web&lt;/code&gt; to &lt;code&gt;evalset.json&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Instead of manually writing test cases, the ADK encourages a "Capture and Replay" workflow. During development (using &lt;code&gt;adk web&lt;/code&gt;), when you spot a successful interaction, you can persist that session state. This generates an &lt;code&gt;evalset.json&lt;/code&gt; that captures not just the input/output, but the &lt;em&gt;expected tool calls&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Conceptual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;structure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ADK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evalset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;entry&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Traditional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;test:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;just&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;input/output&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ADK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evalset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contains&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evalcases&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;invocations:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(queries)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expected_tool_use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;reference&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(output)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ask_GOOGLE_price"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;given&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evaluation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;set&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evaluation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cases&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;included&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;here&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What is the stock price of GOOG?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;input&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"reference"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The price is $175..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;semantic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;output&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"expected_tool_use"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;trajectory&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
            &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_stock_price"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
            &lt;/span&gt;&lt;span class="nl"&gt;"tool_input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;arguments&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;passed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tool&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GOOG"&lt;/span&gt;&lt;span class="w"&gt; 
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; 
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evaluation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cases&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"initial_session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"app_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;specific&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This JSON represents an &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/eval_set.py#L22" rel="noopener noreferrer"&gt;&lt;code&gt;EvalSet&lt;/code&gt;&lt;/a&gt; containing one &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/eval_case.py#L132" rel="noopener noreferrer"&gt;&lt;code&gt;EvalCase&lt;/code&gt;&lt;/a&gt;. Each &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/eval_case.py#L132" rel="noopener noreferrer"&gt;&lt;code&gt;EvalCase&lt;/code&gt;&lt;/a&gt; has a &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/models/gemma_llm.py#L157" rel="noopener noreferrer"&gt;&lt;code&gt;name&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/cli/browser/polyfills-B6TNHZQ6.js#L1" rel="noopener noreferrer"&gt;&lt;code&gt;data&lt;/code&gt;&lt;/a&gt; (which is a list of invocations), and an optional &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/agent_evaluator.py#L233" rel="noopener noreferrer"&gt;&lt;code&gt;initial_session&lt;/code&gt;&lt;/a&gt;. Each invocation within the &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/cli/browser/polyfills-B6TNHZQ6.js#L1" rel="noopener noreferrer"&gt;&lt;code&gt;data&lt;/code&gt;&lt;/a&gt; list includes a &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/contributing/samples/crewai_tool_kwargs/agent.py#L33" rel="noopener noreferrer"&gt;&lt;code&gt;query&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/evaluation_constants.py#L20" rel="noopener noreferrer"&gt;&lt;code&gt;expected_tool_use&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/local_eval_sets_manager.py#L55" rel="noopener noreferrer"&gt;&lt;code&gt;expected_intermediate_agent_responses&lt;/code&gt;&lt;/a&gt;, and a &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/evaluation_constants.py#L22" rel="noopener noreferrer"&gt;&lt;code&gt;reference&lt;/code&gt;&lt;/a&gt; response. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/eval_set.py#L22" rel="noopener noreferrer"&gt;&lt;code&gt;EvalSet&lt;/code&gt;&lt;/a&gt; object itself also includes &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/tests/unittests/evaluation/test_in_memory_eval_sets_manager.py#L34" rel="noopener noreferrer"&gt;&lt;code&gt;eval_set_id&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/models/gemma_llm.py#L157" rel="noopener noreferrer"&gt;&lt;code&gt;name&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/agents/base_agent.py#L117" rel="noopener noreferrer"&gt;&lt;code&gt;description&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/cli/adk_web_server.py#L933" rel="noopener noreferrer"&gt;&lt;code&gt;eval_cases&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/evaluation/eval_set.py#L38" rel="noopener noreferrer"&gt;&lt;code&gt;creation_timestamp&lt;/code&gt;&lt;/a&gt; fields. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuring the Judge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;test_config.json&lt;/code&gt;, we can move beyond simple string matching. The course demonstrated configuring &lt;strong&gt;LLM-as-a-Judge&lt;/strong&gt; evaluators.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Naive Approach:&lt;/strong&gt; Uses an exact match evaluator (brittle, fails on phrasing differences).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architectural Approach:&lt;/strong&gt; Uses &lt;code&gt;TrajectoryEvaluator&lt;/code&gt; alongside &lt;code&gt;SemanticSimilarity&lt;/code&gt;. The ADK allows us to define "Golden Sets" where the &lt;em&gt;reasoning path&lt;/em&gt; is the standard, allowing the LLM judge to penalize agents that skip steps or hallucinate data, even if the final text looks plausible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Core Configuration Components&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To configure an LLM-as-a-Judge effectively, you must construct a specific input payload with four components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Agent's Output:&lt;/strong&gt; The actual response generated by the agent you are testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Original Prompt:&lt;/strong&gt; The specific instruction or query the user provided.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Golden" Answer:&lt;/strong&gt; A reference answer or ground truth to serve as a benchmark.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Detailed Evaluation Rubric:&lt;/strong&gt; Specific criteria (e.g., "Rate helpfulness on a scale of 1-5") and requirements for the judge to explain its reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;ADK Default Evaluators&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The ADK Evaluation Framework includes several default evaluators, accessible via the &lt;code&gt;MetricEvaluatorRegistry&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;RougeEvaluator&lt;/code&gt;&lt;/strong&gt;: Uses the ROUGE-1 metric to score similarity between an agent's final response and a golden response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;FinalResponseMatchV2Evaluator&lt;/code&gt;&lt;/strong&gt;: Uses an LLM-as-a-judge approach to determine if an agent's response is valid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;TrajectoryEvaluator&lt;/code&gt;&lt;/strong&gt;: Assesses the accuracy of an agent's tool use trajectories by comparing the sequence of tool calls against expected calls. It supports various match types (&lt;code&gt;EXACT&lt;/code&gt;, &lt;code&gt;IN_ORDER&lt;/code&gt;, &lt;code&gt;ANY_ORDER&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;SafetyEvaluatorV1&lt;/code&gt;&lt;/strong&gt;: Assesses the safety (harmlessness) of an agent's response, delegating to Vertex Gen AI Eval SDK.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;HallucinationsV1Evaluator&lt;/code&gt;&lt;/strong&gt;: Checks if a model response contains any false, contradictory, or unsupported claims by segmenting the response into sentences and validating each against the provided context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;RubricBasedFinalResponseQualityV1Evaluator&lt;/code&gt;&lt;/strong&gt;: Assesses the quality of an agent's final response against user-defined rubrics, using an LLM as a judge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;RubricBasedToolUseV1Evaluator&lt;/code&gt;&lt;/strong&gt;: Assesses the quality of an agent's tool usage against user-defined rubrics, employing an LLM as a judge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These evaluators can be configured using &lt;code&gt;EvalConfig&lt;/code&gt; objects, which specify the criteria and thresholds for assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bias Mitigation Strategies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A major challenge is handling bias, such as the tendency for models to give average scores or prefer the first option presented:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pairwise Comparison (A/B Testing):&lt;/strong&gt; Instead of asking for an absolute score, configure the judge to compare two different responses (Answer A vs. Answer B) and force a choice. This yields a "win rate," which is often a more reliable signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swapping Operation:&lt;/strong&gt; To counter &lt;strong&gt;position bias&lt;/strong&gt;, invoke the judge twice, swapping the order of the candidates. If the results are inconsistent, the result can be labeled as a "tie".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rule Augmentation:&lt;/strong&gt; Embed specific evaluation principles, references, and rubrics directly into the judge's system prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced Configuration: Agent-as-a-Judge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's a distinction between standard &lt;strong&gt;LLM-as-a-Judge&lt;/strong&gt; (which evaluates final text outputs) and &lt;strong&gt;Agent-as-a-Judge&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard LLM-as-a-Judge:&lt;/strong&gt; Best for evaluating the final response (e.g., "Is this summary accurate?").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-as-a-Judge:&lt;/strong&gt; Necessary when you need to evaluate the &lt;strong&gt;process&lt;/strong&gt;, not just the result. You configure the judge to ingest the agent's full &lt;strong&gt;execution trace&lt;/strong&gt; (including internal thoughts, tool calls, and tool arguments). This allows the judge to assess intermediate steps, such as whether the correct tool was chosen or if the plan was logically structured.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Evaluation Architectures&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can use several architectural approaches when configuring your judge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Point-wise:&lt;/strong&gt; The judge evaluates a single candidate in isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pair-wise / List-wise:&lt;/strong&gt; The judge compares two or more candidates simultaneously to produce a ranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Collaboration:&lt;/strong&gt; For high-stakes evaluation, you can configure multiple LLM judges to debate or vote (e.g., "Peer Rank" algorithms) to produce a final consensus, rather than relying on a single model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a pairwise comparison judge, structure the prompt to output in a structured JSON format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"winner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"B"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tie"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Answer A provided more specific delivery details..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structured output allows you to programmatically parse the judge's decision and calculate metrics like win/loss rates at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analogy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can think of configuring an &lt;strong&gt;LLM-as-a-Judge&lt;/strong&gt; like setting up a &lt;strong&gt;blind taste test&lt;/strong&gt;. If you just hand a judge a cake and ask "Is this good?", they might be polite and say "Yes." But if you provide them with a &lt;strong&gt;Golden Answer&lt;/strong&gt; (a cake baked by a master chef) and use &lt;strong&gt;Pairwise Comparison&lt;/strong&gt; (ask "Which of these two is better?"), you force them to make a critical distinction, resulting in far more accurate and actionable feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Production Considerations
&lt;/h3&gt;

&lt;p&gt;Moving this pattern from a notebook to a live system requires handling scale and cost.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Sampling:&lt;/strong&gt; You cannot trace and judge every single production interaction with an LLM—it’s too expensive. A robust pattern is &lt;strong&gt;100/10 sampling&lt;/strong&gt;: capture 100% of traces that result in user errors or negative feedback, but only sample 10% of successful sessions to monitor for latency drift (P99) and token bloat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Evaluation Flywheel:&lt;/strong&gt; Evaluation isn't a one-time gate before launch. Production traces (captured via OpenTelemetry) must be fed back into the development cycle. Every time an agent fails in production, that specific trajectory should be anonymized and added to the &lt;code&gt;evalset.json&lt;/code&gt; as a regression test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency Impact:&lt;/strong&gt; Trajectory logging must be asynchronous. The user should receive their response immediately, while the trace data is pushed to the observability store (like LangSmith or a custom SQL db) in a background thread to avoid degrading the user experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Ethical Connection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;"The Trajectory is the Truth"&lt;/strong&gt; is the technical implementation of &lt;strong&gt;Fiduciary Responsibility&lt;/strong&gt;. We cannot claim an agent is acting in the user's best interest if we only validate the &lt;em&gt;result&lt;/em&gt; (the "what") while ignoring the &lt;em&gt;process&lt;/em&gt; (the "how"). We must ensure the agent isn't achieving the right ends through manipulative, inefficient, or unethical means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete Failure Scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider a hiring agent that filters job candidates. Without trajectory validation, it could discriminate based on protected characteristics (age, gender, ethnicity) during the filtering process, yet pass all output tests by producing a "diverse" final shortlist through cherry-picking. The bias hides in the &lt;em&gt;how&lt;/em&gt;—which resumes were read, which criteria were weighted, which candidates were never considered. Output validation alone cannot detect this algorithmic discrimination. Only trajectory evaluation exposes the unethical reasoning path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production Principle:&lt;/strong&gt; Trust the reasoning process, not just the output. Trajectory validation is the difference between lucky guesses and reliable intelligence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation:&lt;/strong&gt; Use ADK's &lt;code&gt;TrajectoryEvaluator&lt;/code&gt; with &lt;code&gt;EvalSet&lt;/code&gt; objects to capture expected tool calls alongside expected outputs. Configure LLM-as-a-Judge with Golden Sets and pairwise comparison to avoid evaluation bias.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethical Anchor:&lt;/strong&gt; This pattern operationalizes &lt;strong&gt;Fiduciary Responsibility&lt;/strong&gt;—we validate that the agent serves the user's interests through sound reasoning, not through shortcuts, hallucinations, or hidden bias.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Validating the &lt;em&gt;how&lt;/em&gt; is critical, but what happens when the reasoning path spans not just one conversation turn, but weeks or months? An agent that reasons correctly in the moment can still fail catastrophically if it forgets what it learned yesterday. This brings us to our second pattern: managing the agent's memory architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: Dual-Layer Memory Architecture (Session vs. Memory)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Production Problem
&lt;/h3&gt;

&lt;p&gt;Although models like Gemini 1.5 have introduced massive context windows, treating context as infinite is an architectural anti-pattern.&lt;/p&gt;

&lt;p&gt;Consider a &lt;strong&gt;Travel Agent Bot&lt;/strong&gt;: In Session 1, the user mentions a "shellfish allergy." By Session 10, months later, that critical fact is buried under thousands of tokens of hotel searches and flight comparisons&lt;/p&gt;

&lt;p&gt;This might lead to two very concrete failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context Rot:&lt;/strong&gt; As the context window fills with noise, the model's ability to attend to specific, older instructions (like the allergy) degrades.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Spiral:&lt;/strong&gt; Re-sending the entire history of every past interaction for every new query creates a linear cost increase that makes the system economically unviable at scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The Architectural Solution
&lt;/h3&gt;

&lt;p&gt;We must distinguish between the &lt;strong&gt;Workbench&lt;/strong&gt; and the &lt;strong&gt;Filing Cabinet&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Session (Workbench):&lt;/strong&gt; An ephemeral, mutable space for the current task. It holds the immediate "Hot Path" context. To keep it performant, we apply &lt;strong&gt;Context Compaction&lt;/strong&gt;—automatically summarizing or truncating older turns while keeping the most recent ones raw.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Memory (Filing Cabinet):&lt;/strong&gt; A persistent layer for consolidated facts. This requires an ETL (Extract, Transform, Load) pipeline where the agent &lt;em&gt;Extracts&lt;/em&gt; facts from the session, &lt;em&gt;Consolidates&lt;/em&gt; them (deduplicating against existing knowledge), and &lt;em&gt;Stores&lt;/em&gt; them for semantic retrieval later.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Implementation Details: Code Insights
&lt;/h3&gt;

&lt;p&gt;The ADK moves memory management from manual implementation to configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session Hygiene via Compaction&lt;/strong&gt;&lt;br&gt;
In the ADK, we don't manually trim strings. We configure the agent to handle its own hygiene using &lt;code&gt;EventsCompactionConfig&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents.base_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.apps.app&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;App&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EventsCompactionConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.apps.llm_event_summarizer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmEventSummarizer&lt;/span&gt; &lt;span class="c1"&gt;# Assuming this is your summarizer
&lt;/span&gt;
&lt;span class="c1"&gt;# Define a simple BaseAgent for the example
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseAgent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A simple agent.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# Create an instance of LlmEventSummarizer or your custom summarizer
&lt;/span&gt;&lt;span class="n"&gt;my_summarizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmEventSummarizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create an EventsCompactionConfig
&lt;/span&gt;&lt;span class="n"&gt;events_compaction_config_instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EventsCompactionConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;summarizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_summarizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;compaction_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;overlap_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create an App instance with the EventsCompactionConfig
&lt;/span&gt;&lt;span class="n"&gt;my_app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_application&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MyAgent&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;events_compaction_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;events_compaction_config_instance&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Persistence: From RAM to DB&lt;/strong&gt;&lt;br&gt;
In notebooks, we often use &lt;code&gt;InMemorySessionService&lt;/code&gt;. This is dangerous for production because a container restart wipes the conversation. The architectural shift is moving to &lt;code&gt;DatabaseSessionService&lt;/code&gt; (backed by SQL or Firestore) which persists the &lt;code&gt;Session&lt;/code&gt; object state, allowing users to resume conversations across devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Memory Consolidation Pipeline&lt;/strong&gt;&lt;br&gt;
Day 3b introduced the framework for moving from raw storage to intelligent consolidation. This is where the "Filing Cabinet" becomes smart. The workflow is an &lt;strong&gt;LLM-driven ETL pipeline&lt;/strong&gt; with four stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion:&lt;/strong&gt; The system receives raw session history.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Extraction &amp;amp; Filtering:&lt;/strong&gt; An LLM analyzes the conversation to extract meaningful facts, guided by developer-defined &lt;strong&gt;Memory Topics&lt;/strong&gt;:&lt;br&gt;
The LLM extracts &lt;em&gt;only&lt;/em&gt; facts matching these topics.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Conceptual configuration (Vertex AI Memory Bank, Day 5)
&lt;/span&gt;&lt;span class="n"&gt;memory_topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# "Prefers window seats"
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dietary_restrictions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# "Allergic to shellfish"
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;        &lt;span class="c1"&gt;# "Leading Q4 marketing campaign"
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Consolidation (The "Transform" Phase):&lt;/strong&gt; The LLM retrieves existing memories and decides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CREATE:&lt;/strong&gt; Novel information → new memory entry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UPDATE:&lt;/strong&gt; New info refines existing memory → merge (e.g., "Likes marketing" becomes "Leading Q4 marketing project").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DELETE:&lt;/strong&gt; New info contradicts old → invalidate (e.g., Dietary restrictions change).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage:&lt;/strong&gt; Consolidated memories persist to a vector database for semantic retrieval.&lt;/p&gt;&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Note: While Day 3b uses &lt;code&gt;InMemoryMemoryService&lt;/code&gt; to teach the API, it stores raw events without consolidation. For production-grade consolidation, we look to the Vertex AI Memory Bank integration introduced in Day 5.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval Strategies: Proactive vs. Reactive&lt;/strong&gt;&lt;br&gt;
The course highlighted two distinct patterns for getting data &lt;em&gt;out&lt;/em&gt; of the Filing Cabinet:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Proactive (&lt;code&gt;preload_memory&lt;/code&gt;):&lt;/strong&gt; Injects relevant user facts into the system prompt &lt;em&gt;before&lt;/em&gt; the model generates a response. Best for high-frequency preferences (e.g., "User always prefers aisle seats").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reactive (&lt;code&gt;load_memory&lt;/code&gt;):&lt;/strong&gt; Gives the agent a tool to search the database. The agent decides &lt;em&gt;if&lt;/em&gt; it needs to look something up. Best for obscure facts to save tokens.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  4. Production Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous Consolidation:&lt;/strong&gt; Moving data from the Workbench to the Filing Cabinet is expensive. In production, this ETL process should happen &lt;strong&gt;asynchronously&lt;/strong&gt;. Do not make the user wait for the agent to "file its paperwork." Trigger the memory extraction logic in a background job after the session concludes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Search:&lt;/strong&gt; Keyword search is insufficient for the Filing Cabinet. Production memory requires vector embeddings. If a user asks for "romantic dining," the system must be able to retrieve a past note about "candlelight dinners," even if the word "romantic" wasn't used.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Context Stuffing" Trade-off:&lt;/strong&gt; While &lt;code&gt;preload_memory&lt;/code&gt; reduces latency (no extra tool roundtrip), it increases input token costs on every turn. &lt;code&gt;load_memory&lt;/code&gt; is cheaper on average but adds latency when retrieval is needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  5. Ethical Design Note
&lt;/h3&gt;

&lt;p&gt;This architecture embodies &lt;strong&gt;Privacy by Design&lt;/strong&gt;. By distinguishing between the transient session and persistent memory, we can implement rigorous "forgetting" protocols.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rpwqy62zpbieauqys90.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rpwqy62zpbieauqys90.webp" alt="Pattern 2" width="800" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We scrub Personally Identifiable Information (PII) from the session log &lt;em&gt;before&lt;/em&gt; it undergoes consolidation into long-term memory, ensuring we act as fiduciaries of user data rather than creating an unmanageable surveillance log.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete Failure Scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine a healthcare agent that remembers a patient mentioned their HIV status in Session 1. Without a dual-layer architecture, this fact sits in plain text in the session log forever, accessible to any system with database read permissions. If the system is breached, or if a support engineer needs to debug a session, the patient's private health information is exposed. Worse, without consolidation logic, the system doesn't know to &lt;em&gt;delete&lt;/em&gt; this information if the patient later says "I was misdiagnosed—I don't have HIV." The agent treats every utterance as equally permanent, creating a privacy nightmare where sensitive data proliferates uncontrollably across logs and backups.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production Principle:&lt;/strong&gt; Context is expensive, but privacy is priceless. Design memory systems that distinguish between what an agent needs now (hot session) and what it needs forever (consolidated memory).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation:&lt;/strong&gt; Use &lt;code&gt;EventsCompactionConfig&lt;/code&gt; for session hygiene and implement a PII scrubber in your ETL pipeline before consolidation. Leverage Vertex AI Memory Bank for production-grade semantic memory with built-in privacy controls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethical Anchor:&lt;/strong&gt; This pattern operationalizes &lt;strong&gt;Privacy by Design&lt;/strong&gt;—we build forgetfulness and data minimization into the architecture, treating user data as a liability to protect, not an asset to hoard.&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;With robust evaluation validating our agent's reasoning and a dual-layer memory preserving context over time, we might assume our system is production-ready. But there's a hidden fragility: these capabilities are only as good as the tools and data sources the agent can access. When every integration is a bespoke API wrapper, scaling becomes a maintenance nightmare. This brings us to the third pattern: decoupling agents from their dependencies through standardized protocols.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pattern 3: Protocol-First Interoperability (MCP &amp;amp; A2A)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. The Production Problem
&lt;/h3&gt;

&lt;p&gt;We are facing an &lt;strong&gt;"N×M Integration Trap."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine building a Customer Support Agent. It needs to check &lt;strong&gt;GitHub&lt;/strong&gt; for bugs, message &lt;strong&gt;Slack&lt;/strong&gt; for alerts, and update &lt;strong&gt;Jira&lt;/strong&gt; tickets. Without a standard protocol, you write three custom API wrappers. When GitHub changes an endpoint, your agent breaks.&lt;/p&gt;

&lt;p&gt;Now, multiply this across an enterprise. You have 10 different agents needing access to 20 different data sources. You are suddenly maintaining 200 brittle integration points. Furthermore, these agents become &lt;strong&gt;isolated silos&lt;/strong&gt;—the Sales Agent has no way to dynamically discover or ask the Engineering Agent for help because they speak different "languages."&lt;/p&gt;
&lt;h3&gt;
  
  
  2. The Architectural Solution
&lt;/h3&gt;

&lt;p&gt;The solution is to invert the dependency. Instead of the agent knowing about the specific tool implementation, we adopt a &lt;strong&gt;Protocol-First Architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Context Protocol (MCP):&lt;/strong&gt; For &lt;strong&gt;Tools and Data&lt;/strong&gt;. It decouples the agent (client) from the tool (server). The agent doesn't need to know &lt;em&gt;how&lt;/em&gt; to query a Postgres DB; it just needs to know the MCP interface to ask for data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent2Agent (A2A):&lt;/strong&gt; For &lt;strong&gt;Peers and Delegation&lt;/strong&gt;. It allows for high-level goal delegation. An agent doesn't execute a task; it hands off a goal to another agent via a standardized handshake.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Discovery:&lt;/strong&gt; Instead of hardcoding tools, agents query an MCP Server or an Agent Card at runtime to discover capabilities dynamically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p8gnabnvxbb7d91d3cy.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p8gnabnvxbb7d91d3cy.webp" alt="Pattern 3" width="800" height="547"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Implementation Details: Code Examples from the ADK
&lt;/h3&gt;

&lt;p&gt;The ADK abstracts the heavy lifting of these protocols.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connecting Data via MCP&lt;/strong&gt;&lt;br&gt;
We don't write API wrappers. We instantiate an &lt;code&gt;McpToolset&lt;/code&gt;. The ADK handles the handshake, lists the available tools, and injects their schemas into the context window automatically.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is used to connect an agent to external tools and data sources without writing custom API clients. In ADK, we use &lt;code&gt;McpToolset&lt;/code&gt; to wrap an MCP server configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Connecting an agent to the "Everything" MCP server:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;McpToolset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.mcp_tool.mcp_session_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StdioConnectionParams&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.mcp_tool.mcp_session_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.runners&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt; &lt;span class="c1"&gt;# Assuming Runner is defined elsewhere
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. Define the connection to the MCP Server
# Here we use 'npx' to run a Node-based MCP server directly
&lt;/span&gt;&lt;span class="n"&gt;mcp_toolset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpToolset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioConnectionParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@modelcontextprotocol/server-everything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt; &lt;span class="c1"&gt;# Optional: specify a timeout for connection establishment
&lt;/span&gt;    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;# Optionally filter to specific tools provided by the server
&lt;/span&gt;    &lt;span class="n"&gt;tool_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;getTinyImage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Add the MCP tools to your Agent
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You can generate tiny images using the tools provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# The toolset exposes the MCP capabilities as standard ADK tools
&lt;/span&gt;    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mcp_toolset&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# tools expects a list of ToolUnion
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Run the agent
# The agent can now call 'getTinyImage' as if it were a local Python function
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt; &lt;span class="c1"&gt;# Fill in Runner details to run
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Delegating via A2A (Agent-to-Agent)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Agent2Agent (A2A)&lt;/strong&gt; protocol is used to enable collaboration between different autonomous agents, potentially running on different servers or frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A. Exposing an Agent (&lt;code&gt;to_a2a&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
This converts a local ADK agent into an A2A-compliant server that publishes an Agent Card.&lt;/p&gt;

&lt;p&gt;To make an agent discoverable, we wrap it using the &lt;code&gt;to_a2a()&lt;/code&gt; utility. This generates an &lt;strong&gt;Agent Card&lt;/strong&gt;—a standardized manifest hosted at &lt;code&gt;.well-known/agent-card.json&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.a2a.utils.agent_to_a2a&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;to_a2a&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.tool_context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="c1"&gt;# Define the tools
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_die&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a die and return the rolled result.

  Args:
    sides: The integer number of sides the die has.
    tool_context: the tool context
  Returns:
    An integer of the result of rolling the die.
  &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rolls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rolls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

  &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rolls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rolls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_prime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if a given list of numbers are prime.

  Args:
    nums: The list of numbers to check.

  Returns:
    A str indicating which number is prime.
  &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
  &lt;span class="n"&gt;primes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="n"&gt;is_prime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;is_prime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_prime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;primes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No prime numbers found.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;primes&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;primes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; are prime numbers.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define your local agent with relevant tools and instructions
# This example uses the 'hello_world' agent's logic for rolling dice and checking primes.
&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hello_world_agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hello world agent that can roll a die of 8 sides and check prime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; numbers.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
      You roll dice and answer questions about the outcome of the dice rolls.
      When you are asked to roll a die, you must call the roll_die tool with the number of sides.
      When checking prime numbers, call the check_prime tool with a list of integers.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;roll_die&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;check_prime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;generate_content_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;safety_settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SafetySetting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HarmCategory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HARM_CATEGORY_DANGEROUS_CONTENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HarmBlockThreshold&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Convert to A2A application
# This automatically generates the Agent Card and sets up the HTTP endpoints
&lt;/span&gt;&lt;span class="n"&gt;a2a_app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;to_a2a&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# To run this application, save it as a Python file (e.g., `my_a2a_agent.py`)
# and execute it using uvicorn:
# uvicorn my_a2a_agent:a2a_app --host localhost --port 8001
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The Agent Card (Discovery):&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Agent Card&lt;/strong&gt; is a standardized JSON file that acts as a "business card" for an agent, allowing other agents to discover its capabilities, security requirements, and endpoints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world_agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello world agent that can roll a die of 8 sides and check prime numbers. You roll dice and answer questions about the outcome of the dice rolls. When you are asked to roll a die, you must call the roll_die tool with the number of sides. When checking prime numbers, call the check_prime tool with a list of integers."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"doc_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8001/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"capabilities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skills"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world_agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello world agent that can roll a die of 8 sides and check prime numbers. I roll dice and answer questions about the outcome of the dice rolls. When I am asked to roll a die, I must call the roll_die tool with the number of sides. When checking prime numbers, call the check_prime tool with a list of integers."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world_agent-roll_die"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"roll_die"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Roll a die and return the rolled result."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"tools"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world_agent-check_prime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"check_prime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Check if a given list of numbers are prime."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"tools"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"default_input_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"text/plain"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"default_output_modes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"text/plain"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"supports_authenticated_extended_card"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"security_schemes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;B. Consuming a Remote Agent (&lt;code&gt;RemoteA2aAgent&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To consume this, the parent agent simply points to the URL. The ADK treats the remote agent exactly like a local tool.&lt;/p&gt;

&lt;p&gt;This allows a local agent to delegate tasks to a remote agent by reading its Agent Card.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents.remote_a2a_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AGENT_CARD_WELL_KNOWN_PATH&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents.remote_a2a_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RemoteA2aAgent&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define the remote agent interface
# Points to the .well-known/agent.json of the running A2A server
&lt;/span&gt;&lt;span class="n"&gt;prime_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RemoteA2aAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote_prime_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent that handles checking if numbers are prime.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent_card&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8001/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;AGENT_CARD_WELL_KNOWN_PATH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Use the remote agent as a sub-agent
&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coordinator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Explicitly define the model
&lt;/span&gt;    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
      You are a coordinator agent.
      Your primary task is to delegate any requests related to prime number checking to the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;remote_prime_agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.
      Do not attempt to check prime numbers yourself.
      Ensure to pass the numbers to be checked to the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;remote_prime_agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; correctly.
      Clarify the results from the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;remote_prime_agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to the user.
      &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prime_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# You can then use this root_agent with a Runner, for example:
# from google.adk.runners import Runner
# runner = Runner(agent=root_agent)
# async for event in runner.run_async(user_id="test_user", session_id="test_session", new_message="Is 13 a prime number?"):
#     print(event)
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While both protocols connect AI systems, they operate at different levels of abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use which?&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;MCP&lt;/strong&gt; when you need deterministic execution of specific functions (stateless).&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;A2A&lt;/strong&gt; when you need to offload a fuzzy goal that requires reasoning and state management (stateful).&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;A2A (Agent2Agent Protocol)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Domain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Tools &amp;amp; Resources&lt;/strong&gt;.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Autonomous Agents&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;"Do this specific thing"&lt;/strong&gt;. Stateless execution of functions (e.g., "query database," "fetch file").&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;"Achieve this complex goal"&lt;/strong&gt;. Stateful, multi-turn collaboration where the remote agent plans and reasons.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Abstraction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Low-level plumbing.&lt;/strong&gt; Connects LLMs to data sources and APIs (like a USB-C port for AI).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High-level collaboration.&lt;/strong&gt; Connects intelligent agents to other intelligent agents to delegate responsibility.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Standard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standardizes &lt;strong&gt;tool definitions&lt;/strong&gt;, &lt;strong&gt;prompts&lt;/strong&gt;, and &lt;strong&gt;resource reading&lt;/strong&gt;.&lt;/td&gt;
&lt;td&gt;Standardizes &lt;strong&gt;agent discovery&lt;/strong&gt; (Agent Card), &lt;strong&gt;task lifecycles&lt;/strong&gt;, and &lt;strong&gt;asynchronous communication&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analogy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Using a specific wrench or diagnostic scanner.&lt;/td&gt;
&lt;td&gt;Asking a specialized mechanic to fix a car engine.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How they work together:&lt;/strong&gt;&lt;br&gt;
An application might use &lt;strong&gt;A2A&lt;/strong&gt; to orchestrate high-level collaboration between a "Manager Agent" and a "Coder Agent." &lt;/p&gt;

&lt;p&gt;The "Coder Agent," in turn, uses &lt;strong&gt;MCP&lt;/strong&gt; internally to connect to GitHub tools and a local file system to execute the work.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Production Considerations
&lt;/h3&gt;

&lt;p&gt;Moving protocols from &lt;code&gt;stdio&lt;/code&gt; (local process) to HTTP (production network) introduces critical security challenges.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The "Confused Deputy" Problem:&lt;/strong&gt; Protocols decouple execution, but they also expose risks. A malicious user might trick a privileged agent (the deputy) into using an MCP file-system tool to read sensitive configs. Production architectures must enforce &lt;strong&gt;Least Privilege&lt;/strong&gt; by placing MCP servers behind API Gateways that enforce policy checks &lt;em&gt;before&lt;/em&gt; the tool is executed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery vs. Latency:&lt;/strong&gt; Dynamic discovery adds a round-trip latency cost at startup (handshaking). In production, we often cache tool definitions (static binding) for performance, while keeping the execution dynamic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance:&lt;/strong&gt; To prevent "Tool Sprawl" where agents connect to unverified servers, enterprises need a &lt;strong&gt;Centralized Registry&lt;/strong&gt;—an allowlist of approved MCP servers and Agent Cards that act as the single source of truth for capabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  5. Ethical Design Note
&lt;/h3&gt;

&lt;p&gt;Protocol-first architectures are the technical foundation for &lt;strong&gt;Human Sovereignty&lt;/strong&gt; and &lt;strong&gt;Data Portability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Standardizing the interface (MCP) helps us prevent vendor lock-in, among many other advantages. A user can swap out a "Google Drive" data source for a "Local Hard Drive" source without breaking the agent, ensuring the user—not the platform—controls where the data lives and how it is accessed.&lt;/p&gt;

&lt;p&gt;This abstraction acts as a bulwark against &lt;strong&gt;algorithmic lock-in&lt;/strong&gt;, ensuring that an agent's reasoning capabilities are decoupled from proprietary tool implementations, preserving the user's freedom to migrate their digital ecosystem without losing their intelligent assistants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete Failure Scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine a small business builds a customer service agent tightly coupled to Salesforce's proprietary API. Over three years, the agent accumulates thousands of lines of custom integration code. When Salesforce raises prices 300%, the business wants to migrate to HubSpot—but their agent is fundamentally Salesforce-shaped. Every tool, every data query, every workflow assumption is hardcoded. Migration means rebuilding the agent from scratch, which the business can't afford. They're trapped. This is &lt;strong&gt;algorithmic lock-in&lt;/strong&gt;—not just vendor lock-in of data, but vendor lock-in of intelligence. Without protocol-first design, the agent becomes a hostage to the platform, and the user loses sovereignty over their own automation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production Principle:&lt;/strong&gt; Agents should depend on interfaces, not implementations. Protocol-first design (MCP for tools, A2A for peers) inverts the dependency and prevents the N×M integration trap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation:&lt;/strong&gt; Use &lt;code&gt;McpToolset&lt;/code&gt; to connect agents to data sources via the Model Context Protocol. Use &lt;code&gt;RemoteA2aAgent&lt;/code&gt; and &lt;code&gt;to_a2a()&lt;/code&gt; for agent-to-agent delegation. Cache tool definitions at startup for performance, but keep execution dynamic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethical Anchor:&lt;/strong&gt; This pattern operationalizes &lt;strong&gt;Human Sovereignty&lt;/strong&gt; and &lt;strong&gt;Data Portability&lt;/strong&gt;—users control where their data lives and which tools their agents use, free from vendor lock-in or algorithmic hostage-taking.&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;We now have agents that reason correctly, remember what matters, and connect to any tool or peer through standard protocols. But there's one final constraint that threatens to unravel everything: the assumption that every interaction completes in a single request-response cycle. Real business workflows don't work that way. Approvals take hours. External APIs time out. Humans need time to think. This is where our fourth pattern becomes essential: teaching agents to pause, persist, and resume across the boundaries of time itself.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pattern 4: Long-Running Operations &amp;amp; Resumability
&lt;/h2&gt;

&lt;p&gt;This is perhaps the most critical pattern for integrating agents into real-world business logic where human approval is non-negotiable.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. The Production Problem
&lt;/h3&gt;

&lt;p&gt;Naive agents fall into the &lt;strong&gt;"Stateless Trap."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine a &lt;strong&gt;Procurement Agent&lt;/strong&gt; tasked with ordering 1,000 servers. &lt;/p&gt;

&lt;p&gt;The workflow is: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Analyze quotes&lt;/li&gt;
&lt;li&gt;Propose the best option&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Wait for CFO approval&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Place the order&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a mermaid sequence diagram illustrating the procurement workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8ka0cekcbx0bmz2y88j.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8ka0cekcbx0bmz2y88j.webp" alt="Pattern 4_1" width="800" height="762"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This diagram shows the sequential flow from analyzing quotes through to placing the order, with the critical approval step from the CFO in the middle.&lt;/p&gt;

&lt;p&gt;If the CFO takes 2 hours to review the proposal, a standard HTTP request will time out in seconds. When the CFO finally clicks "Approve," the agent has lost its memory. It doesn't know which vendor it selected, the quote ID, or why it made that recommendation. It essentially has to start over.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. The Architectural Solution
&lt;/h3&gt;

&lt;p&gt;The solution is a &lt;strong&gt;Pause, Persist, Resume&lt;/strong&gt; architecture.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Event-Driven Interruption:&lt;/strong&gt; The agent doesn't just "wait." It emits a specific system event (&lt;code&gt;adk_request_confirmation&lt;/code&gt;) and &lt;strong&gt;halts execution immediately&lt;/strong&gt;, releasing compute resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Persistence:&lt;/strong&gt; The agent's full state (conversation history, tool parameters, reasoning scratchpad) is serialized and stored in a database, keyed by an &lt;code&gt;invocation_id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Anchor (&lt;code&gt;invocation_id&lt;/code&gt;):&lt;/strong&gt; This ID becomes the critical "bookmark." When the human acts, the system rehydrates the agent using this ID, allowing it to resume &lt;em&gt;exactly&lt;/em&gt; where it left off—inside the tool call—rather than restarting the conversation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnugzjbe27qnqyrpwyqcz.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnugzjbe27qnqyrpwyqcz.webp" alt="Pattern 4_2" width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Implementation Details: Code Insights
&lt;/h3&gt;

&lt;p&gt;The ADK provides the &lt;code&gt;ToolContext&lt;/code&gt; and &lt;code&gt;App&lt;/code&gt; primitives to handle this complexity without writing custom state machines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Three-State Tool Pattern&lt;/strong&gt;&lt;br&gt;
Inside your tool definition, you must handle three scenarios: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Automatic approval (low stakes)&lt;/li&gt;
&lt;li&gt;Initial request (pause)&lt;/li&gt;
&lt;li&gt;Resumption (action)
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Scenario 1: Small orders auto-approve
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;num_units&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_units&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Scenario 2: First call - request approval (PAUSE)
&lt;/span&gt;    &lt;span class="c1"&gt;# The tool checks if confirmation exists. If not, it requests it and halts.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_confirmation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request_confirmation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;hint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Large order: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_units&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; units. Approve?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;num_units&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Scenario 3: Resume - check decision (ACTION)
&lt;/span&gt;    &lt;span class="c1"&gt;# The tool runs again, but this time confirmation exists.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_confirmation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confirmed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORD-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_units&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rejected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Approval (Scenario 1):&lt;/strong&gt; The initial &lt;code&gt;if num_units &amp;lt;= 5:&lt;/code&gt; block handles immediate, non-long-running scenarios, which is a common pattern for tools that can quickly resolve simple requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initial Request (Pause - Scenario 2):&lt;/strong&gt; The &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/tools/function_tool.py#L4" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;if not tool_context.tool_confirmation:&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; block  leverages &lt;code&gt;tool_context.request_confirmation()&lt;/code&gt; to signal that the tool requires external input to proceed. The return of &lt;code&gt;{"status": "pending"}&lt;/code&gt; indicates that the operation is not yet complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumption (Action - Scenario 3):&lt;/strong&gt; The final &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/tools/function_tool.py#L56" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;if tool_context.tool_confirmation.confirmed:&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; block demonstrates how the tool re-executes, this time finding &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/tools/function_tool.py#L196" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;tool_context.tool_confirmation&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; present, indicating that the external input has been provided. The tool then acts based on the &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/tools/tool_confirmation.py#L41" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;confirmed&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; status. The &lt;a href="https://codewiki.google/github.com/google/adk-python#contribution-guidelines-and-samples-human-in-the-loop" rel="noopener noreferrer"&gt;Human-in-the-Loop Workflow Samples&lt;/a&gt; also highlights how the application constructs a &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/events/event.py#L108" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;types.FunctionResponse&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; with the updated status and sends it back to the agent to resume its task.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Application Wrapper&lt;/strong&gt;&lt;br&gt;
To enable persistence, we wrap the agent in an &lt;code&gt;App&lt;/code&gt; with &lt;code&gt;ResumabilityConfig&lt;/code&gt;. This tells the ADK to automatically handle state serialization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.apps&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;App&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ResumabilityConfig&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;procurement_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resumability_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ResumabilityConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_resumable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Workflow Loop&lt;/strong&gt;&lt;br&gt;
The runner loop must detect the pause and, crucially, use the same &lt;code&gt;invocation_id&lt;/code&gt; to resume.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Initial Execution
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Detect Pause &amp;amp; Get ID
&lt;/span&gt;&lt;span class="n"&gt;approval_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check_for_approval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;approval_info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# ... Wait for user input (hours/days) ...
&lt;/span&gt;    &lt;span class="n"&gt;user_decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user_decision&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# True/False
&lt;/span&gt;
    &lt;span class="c1"&gt;# 3. Resume with INTENT
&lt;/span&gt;    &lt;span class="c1"&gt;# We pass the original invocation_id to rehydrate state
&lt;/span&gt;    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;invocation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;approval_info&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invocation_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;new_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_approval_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_decision&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Agent continues execution from inside place_order()
&lt;/span&gt;        &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow shows the mechanism for resuming an agent's execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial Execution:&lt;/strong&gt; The first &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/AGENTS.md?plain=1#L45" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;runner.run_async()&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; call initiates the agent's interaction, which eventually leads to the &lt;code&gt;place_order&lt;/code&gt; tool returning a "pending" status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detecting Pause &amp;amp; Getting ID:&lt;/strong&gt; Detect the "pending" state and extract the &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/agents/readonly_context.py#L44" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;invocation_id&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt;. Check the &lt;a href="https://codewiki.google/github.com/google/adk-python#agent-components-and-types-invocation-context-and-state-management" rel="noopener noreferrer"&gt;Invocation Context and State Management&lt;/a&gt; code wiki section to check  how &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/agents/invocation_context.py#L98" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;InvocationContext&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; tracks an agent's state and supports resumable operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resuming with Intent:&lt;/strong&gt; The crucial part is calling &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/AGENTS.md?plain=1#L45" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;runner.run_async()&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; again with the &lt;em&gt;same&lt;/em&gt; &lt;a href="https://github.com/google/adk-python/blob/2b6471550591ee7fc5f70f79e66a6e4080df442b/src/google/adk/agents/readonly_context.py#L44" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;invocation_id&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt;. This tells the ADK to rehydrate the session state and resume the execution from where it left off, providing the new message (the approval decision) as input. This behavior is used in the &lt;a href="https://codewiki.google/github.com/google/adk-python#contribution-guidelines-and-samples-human-in-the-loop" rel="noopener noreferrer"&gt;Human-in-the-Loop Workflow Samples&lt;/a&gt;,  where the runner orchestrates agent execution and handles multi-agent coordination.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Production Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistence Strategy:&lt;/strong&gt; &lt;code&gt;InMemorySessionService&lt;/code&gt; is insufficient for production resumability because a server restart kills pending approvals. You must use a persistent store like &lt;strong&gt;Redis&lt;/strong&gt; or &lt;strong&gt;PostgreSQL&lt;/strong&gt; to save the serialized agent state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI Signaling:&lt;/strong&gt; The &lt;code&gt;adk_request_confirmation&lt;/code&gt; event should trigger a real-time notification (via WebSockets) to the user's frontend, rendering an "Approve/Reject" card.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-To-Live (TTL):&lt;/strong&gt; Pending approvals shouldn't live forever. Implement a TTL policy (e.g., 24 hours) after which the state is garbage collected and the order is auto-rejected to prevent stale context rehydration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Ethical Design Note
&lt;/h3&gt;

&lt;p&gt;This pattern is the technical implementation of &lt;strong&gt;Meaningful Human Control&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It ensures high-stakes actions (Agency) remain subservient to human authorization (Sovereignty), preventing "rogue actions" where an agent executes irreversible decisions (like spending budget) without explicit oversight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete Failure Scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine a financial trading agent receives a signal to liquidate a portfolio position. Without resumability, the agent operates in a stateless, atomic transaction: detect signal → execute trade. There's no pause for human review. If the signal is based on a data glitch (a "flash crash"), or if market conditions have changed in the seconds between signal and execution, the agent completes an irreversible $10M trade that wipes out a quarter's earnings. The human operator sees the confirmation &lt;em&gt;after&lt;/em&gt; the damage is done. Worse, if the system crashes mid-execution, the agent loses context and might try to execute the same trade twice, compounding the disaster. Without &lt;strong&gt;Meaningful Human Control&lt;/strong&gt; embedded in the architecture, the agent becomes a runaway train.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production Principle:&lt;/strong&gt; High-stakes actions require human-in-the-loop workflows. Design agents that can pause, wait for approval, and resume execution without losing context—spanning hours or days, not just seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation:&lt;/strong&gt; Use &lt;code&gt;ToolContext.request_confirmation()&lt;/code&gt; for tools that need approval. Configure &lt;code&gt;ResumabilityConfig&lt;/code&gt; in your &lt;code&gt;App&lt;/code&gt; to enable state persistence. Use the &lt;code&gt;invocation_id&lt;/code&gt; to resume execution from the exact point of interruption. Store state in Redis or PostgreSQL, never in-memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethical Anchor:&lt;/strong&gt; This pattern operationalizes &lt;strong&gt;Meaningful Human Control&lt;/strong&gt;—we architecturally prevent agents from executing irreversible, high-stakes actions without explicit human authorization, preserving human sovereignty over consequential decisions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Google &amp;amp; Kaggle Intensive was a masterclass not just in coding, but in &lt;strong&gt;thinking&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Building agents is not just about chaining prompts; it is about designing resilient systems that can handle the messiness of the real world.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation&lt;/strong&gt; ensures we trust the process, not just the result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual-Layer Memory&lt;/strong&gt; solves the economic and context limits of LLMs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol-First (MCP)&lt;/strong&gt; prevents integration spaghetti and silos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumability&lt;/strong&gt; allows agents to participate in human-speed workflows safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where to Start: A Prioritization Guide
&lt;/h3&gt;

&lt;p&gt;If you're moving your first agent from prototype to production, consider implementing these patterns in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with Pattern 1 (Evaluation).&lt;/strong&gt; Without trajectory validation, you're flying blind. Capture a handful of golden trajectories from your &lt;code&gt;adk web&lt;/code&gt; sessions, configure a &lt;code&gt;TrajectoryEvaluator&lt;/code&gt;, and establish your evaluation baseline before writing another line of agent code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add Pattern 4 (Resumability) early&lt;/strong&gt; if your agent performs &lt;em&gt;any&lt;/em&gt; action that requires human approval or waits on external systems (payment processing, legal review, third-party APIs). The cost of refactoring a stateless agent into a resumable one later is enormous. Build with &lt;code&gt;invocation_id&lt;/code&gt; and &lt;code&gt;ToolContext.request_confirmation()&lt;/code&gt; from day one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Pattern 2 (Dual-Layer Memory)&lt;/strong&gt; when your agent starts handling multi-turn conversations or personalization. If you see users repeating themselves across sessions ("I'm allergic to shellfish" → 3 months later → "I'm allergic to shellfish"), or if your context costs are climbing, it's time for the Workbench/Filing Cabinet split.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adopt Pattern 3 (Protocol-First Interoperability)&lt;/strong&gt; when you need to integrate your &lt;em&gt;second&lt;/em&gt; data source or agent. The first integration is always bespoke; the second is where you refactor to MCP/A2A or accept technical debt forever. Don't wait until you have ten brittle integrations to wish you'd used protocols.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Architect's Responsibility
&lt;/h3&gt;

&lt;p&gt;As we move forward, our job as architects is to ensure these systems are not just smart, but &lt;strong&gt;reliable, efficient, and ethical.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are not just building tools—we are defining the interface between human intention and machine action. Every architectural decision we make either preserves or erodes human sovereignty, privacy, and meaningful control.&lt;/p&gt;

&lt;p&gt;When you choose to validate trajectories, you're not just improving test coverage—you're building &lt;strong&gt;fiduciary responsibility&lt;/strong&gt; into the system.&lt;/p&gt;

&lt;p&gt;When you separate session from memory, you're not just optimizing token costs—you're designing for &lt;strong&gt;privacy by default&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you adopt MCP and A2A, you're not just reducing integration complexity—you're preserving &lt;strong&gt;user freedom&lt;/strong&gt; from algorithmic lock-in.&lt;/p&gt;

&lt;p&gt;When you implement resumability, you're not just handling timeouts—you're enforcing &lt;strong&gt;meaningful human control&lt;/strong&gt; over consequential actions.&lt;/p&gt;

&lt;p&gt;These patterns are not neutral technical choices. They are ethical choices encoded in architecture.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let's build responsibly.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>agents</category>
      <category>devchallenge</category>
    </item>
  </channel>
</rss>
