Forem: Médéric Hurier (Fmind)

How I Revamped My Portfolio Website in 5 Nights Using AI Agents

Médéric Hurier (Fmind) — Thu, 19 Feb 2026 20:43:26 +0000

I see many friends and acquaintances generating amazing applications in mere weeks. We are in the midst of a craze of innovation, an era where inspired people can bring their ideas to life without being strictly bounded by the usual constraints of tool mastery.

This means there is no time to slack off. The go-getters will not wait for you to catch up. One representative aspect of this fast-moving landscape is your digital presence — your portfolio.

As a freelance AI/ML Architect, I realized I needed a space that truly reflected my expertise. In this article, I present the revamp of my own portfolio: fmind.dev.

New Homepage of Fmind.dev

The Old and Dusty Website

My previous website was built with Google Sites. It was cheap, highly functional, and effortless to maintain.

Previous version of Fmind.dev

But functional is no longer enough in 2026, especially for a freelancer who constantly needs to be on the bleeding edge of technology. Google Sites afforded me very little control over styling, strict limitations on SEO, and — crucially — zero ability to easily evolve it using modern AI coding assistants.

Something had to be done.

Preparing the Field: Setting Up the Agent

The secret to a successful AI-assisted project isn’t just jumping in and prompting; it’s about preparing the groundwork.

I started by defining several agent skills tailored to prepare my specific text stack. For this project, I created dedicated skills covering:

Backend Development
Frontend Development
GCP Deployment
GCP Observability
Mobile Optimization
Project Tooling
SEO Optimization

Then, instead of continuously reminding the agent who I am and what I want, I explicitly created two context files:

PROFILE.md : Outlining my professional identity, experience, and links.

# Profile - Médéric Hurier (Fmind)

## Headline

Freelancer • AI/ML Architect & Engineer • AI Agents & MLOps • GCP Professional Cloud Architect • PhD in AI & Computer Security

...

DESIGN.md : Defining my brand identity (e.g., “Space & Tech” aesthetic).

# Website Design

## Brand Identity

A professional, advanced, and modern digital presence for an AI/ML Architect.

Blending clean and modern style with a "Space & Tech" aesthetic to reflect deep expertise in Artificial Intelligence and MLOps.

...

By establishing these foundational documents, I set the ground rules. The AI agent had immediate access to my personality and brand identity, saving me from having to explain the context over and over again.

Building the Website: Steering the AI

Rather than “vibe coding” everything and blindly compiling the results, I made a strict rule: I would review every single file generated by the agent. As an engineer, I refuse to be responsible for a codebase I haven’t read or understood.

Despite the agent’s impressive capabilities, I noted several weaknesses that you must manage when using AI agents to generating code:

“Agents don’t care about your brand. They just want to get the task done.”

— A harsh truth of AI coding

It’s entirely up to you to rigorously maintain your identity and ensure the output matches your needs.

“Code quickly degenerates. Without supervision, AI can generate piles and piles of code lacking underlying logic.”

Agents are fantastic at refactoring, but you are the architect who must point them in the right direction.

Choosing the right tech stack is key. Call me old school, but I demand mastery over what I produce. I decided to use Python as the foundational layer for my programming logic. I am confident in my ability to maintain, debug, and evolve a Python-based architecture over the long term.

Once you actively tackle these limitations, the results are breathtaking. While I generally don’t enjoy writing repetitive boilerplate or templates, the AI agent excels at it and never gets tired. You can direct its focus entirely to the frontend, backend, deployment, SEO, or mobile optimization. Watching the pieces fall into place gives you an incredible dopamine rush.

The Result

I was able to finalize the website from scratch in just 5 nights.

This rapid timeline included everything: generating the base components, extensive optimizations, refactoring, and setting up proper deployment pipelines (including analytics and observability). While I could have let the agent run entirely autonomously, it is my website; I wanted to be an active part of the process.

Result of PageSpeed Insights (LightHouse)

I’m thrilled to report that the new site achieved a 100% Lighthouse score across every single category. The site is fast, modern, and beautifully aligned with my brand. I would have never been able to produce this level of polish by myself in such a short timeframe, and I’m incredibly proud of the result.

For context on the effort, here is a quick look at the codebase generated and reviewed during those 5 nights:

github.com/AlDanial/cloc v 1.98 T=0.10 s (915.2 files/s, 30379.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
HTML 57 33 40 1257
Markdown 11 277 0 580
Python 12 138 79 368
TOML 1 8 1 84
YAML 3 0 3 81
CSS 2 9 0 41
Text 4 7 0 33
JSON 1 0 0 31
Dockerfile 1 1 2 27
-------------------------------------------------------------------------------
SUM: 92 473 125 2502
-------------------------------------------------------------------------------

Conclusions

AI Coding is an absolute game-changer. I highly encourage any developer or freelancer to revamp their own website as soon as possible. It is the perfect, tightly-scoped exercise to discover the true, practical capacity of these tools for yourself.

Key Takeaways

Start with clear context files to ground your agent’s understanding.
Do not blindly trust generated code; review everything to maintain architectural control.
Avoid “shiny object syndrome.” Relying entirely on AI without architectural vision can lead to generic, unmaintainable results. Find the right balance between automation and engineering rigor.

Check out the final result at fmind.dev.

About me on Fmind.dev

If you tried ‘vibe coding’ on your own projects yet, let me know your experience in the comments!

Chaigent: An affordable alternative to Gemini Enterprise on Google Cloud

Médéric Hurier (Fmind) — Fri, 06 Feb 2026 20:14:02 +0000

The era of simple chatbots is over. Companies are now racing to build AI Agent platforms — systems that don’t just talk, but act. Whether it’s a support bot resolving Jira tickets or a data analyst agent querying BigQuery, these new digital teammates need a platform that offers more than just text generation: they require reasoning, security, and enterprise-grade observability.

Gemini Enterprise provides a great path to achieving this on Google Cloud. It offers a comprehensive set of features including agent exposition, governance, integrated knowledge search, and a visual agent builder, connecting with backends like Vertex AI Agent Engine, Conversational Agent, or A2A.

However, for some organizations or specific use cases, the cost can be a friction point. The catalog price sits at ~$7/user/month for agent users and ~$35/user/month for visual agent builders. While this pricing is competitive for knowledge workers who gain significant productivity, it can be prohibitive for large audiences with lower usage frequency, such as field workers or occasional users.

Enter Chaigent: https://github.com/fmind/chaigent .

Chaigent is an affordable alternative to Gemini Enterprise (Source: Gemini App)

In this article, I present “Chaigent” (Chainlit + Agent), a cost-effective, DIY alternative to Gemini Enterprise on Google Cloud. It leverages the same powerful underlying reasoning engine but replaces the managed frontend with an open-source framework, giving you control over features and costs.

The Architecture

Chaigent enables you to build a private, secure AI agent platform by combining serverless infrastructure with open-source tooling.

Architecture of Chaigent (Source: Fmind.dev)

The architecture consists of three main layers:

Frontend (Chainlit on Cloud Run): A Python-based UI that handles user sessions, chat history, and authentication.
Backend (Vertex AI Agent Engine): The “brain” of the operation, capable of reasoning and tool use.
Persistence & Auth : Cloud SQL for storing chat history and feedback, and OAuth (Google, GitHub, etc.) for secure identity management.

This approach allows you to pay for consumption only (Cloud Run CPU + Vertex AI tokens), significantly reducing costs for intermittent usage patterns compared to a flat per-seat license.

The “Do It Yourself” Trade-off

Gemini Enterprise provides a managed, “batteries-included” platform with built-in governance and visual tools. Chaigent, in contrast, offers a code-first, developer-centric approach.

What you gain:

Cost Efficiency : No monthly per-seat licensing fees.
Full Customization : You own the code. Want to add a custom feedback mechanism or a specific UI widget? You can.
Platform Independence : Using Chainlit (frontend) and Google ADK (backend) logic keeps you flexible.

What you lose (The “Subtext”):

No Visual Builder : You define agents in code, not a drag-and-drop UI.
Manual Governance : You must implement your own permission logic per agent.
Ops Overhead : You are responsible for deploying, securing, and updating the application.
Enterpise Features : Advanced features like Model Armor (Prompt Security) and integrated Knowledge Search (RAG) require manual implementation.

Implementation Highlights

Chaigent is surprisingly simple to set up. Here is a glimpse of the code.

1. Defining the Agent

The agent is defined declaratively using the Google ADK. It’s just a Python object specifying the model and tools.

# chaigent/agent.py
root_agent = agent(
    name="chaigent",
    model="gemini-2.5-flash",
    description="answer questions with google search.",
    instruction="you are an expert researcher. you always stick to the facts.",
    tools=[google_search],
)

2. The Bridge (Chainlit Adapter)

The app.py acts as the bridge. It connects the user’s chat session to the Vertex AI Agent Engine, handling the streaming response seamlessly.

# app.py

@cl.on_message
async def on_message(message: cl.Message):
    # Initialize response message
    answer = cl.Message(content="")
    await answer.send()

    # Retrieve session
    session = cl.user_session.get("session")
    user_id, session_id = session["userId"], session["id"]

    # Stream the query to Vertex AI
    response_stream = engine.async_stream_query(
        user_id=user_id, message=message.content, session_id=session_id
    )

    # Stream back the tokens
    async for chunk in response_stream:
        for part in chunk.get("content", {}).get("parts", []):
            text = part.get("text", "")
            if text:
                await answer.stream_token(text)
                await answer.update()

User Experience

Despite being a “DIY” solution, the user experience is premium. Chainlit provides features that users expect from modern chat apps.

Rich Chat Interface : Supports markdown, code highlighting, and streaming responses out of the box.

Chat interface of Chaigent

Authentication & Persistence : Secure login screens and persisted chat history allow users to resume conversations across devices.

Login screen of Chaigent

Data Layer : All interactions are stored in your own SQL database, giving you full ownership of the data for analytics or fine-tuning later.

Home screen of Chaigent

Conclusion

Chaigent is an excellent solution when cost efficiency is the primary driver, particularly for large audiences with low individual usage.

The decision comes down to ROI. At ~$7/month/user for Gemini Enterprise, you need to save each user at least one hour of work per month to break even. For knowledge workers, this is a no-brainer. But for field workers or casual users, a consumption-based “Pay-as-you-go” model like Chaigent might be the smarter financial move.

If you are ready to trade some convenience for control and cost savings, go build your own agents!

Source: Gemini App

mAIdAI: Building a Personal Assistant with Google Cloud and Vertex AI

Médéric Hurier (Fmind) — Thu, 05 Feb 2026 06:05:23 +0000

As an AI Architect, I spend my days designing AI systems and agents for others. I optimize workflows, fine-tune context windows, and architect serverless solutions to solve complex business problems.

But recently, I caught myself in a classic “cobbler’s children” scenario. While helpful bots supported my teams, I navigated my own workflow manually — answering the same repetitive questions, digging for the same documentation links, and context-switching constantly.

I realized I needed something different. Not another generic team bot, but a Personal AI Assistant — one that knows my specific context, my preferred shortcuts, and my tone.

So I built mAIdAI (My AI Aid) : https://github.com/fmind/maidai

mAIdAI Avatar (Source: Gemini App)

In this article, I’ll walk you through how I architected this personal agent using Google Chat , Cloud Run , and Vertex AI.

The Problem: The High Cost of “Quick” Tasks

We often underestimate the micro-friction in our daily work.

“Where is the design doc for Project X?”
“What’s the syntax for that specific gcloud command again?”
“Can you review this snippet?”

Standard team bots are great, but they are generic. They lack the specific context of your personal role and responsibilities. I wanted an agent that acts as a “Second Brain” — grounded in my personal knowledge and capable of executing my specific workflows.

The Solution: mAIdAI Pattern

mAIdAI is designed around three core interaction types:

Context-Aware Chat : A conversational flow grounded in a personal context.md file effective “system instructions”.
Quick Commands : Instant helpers that return static values (like commonly used links or snippets) without invoking the LLM.
Slash Commands : specialized triggers that wrap user input in a predefined prompt template (e.g., /fix to debug code).

Demo of mAIdAI on Google Chat (Generic Version)

About mAIdAI

Architecture

The system follows a lightweight, serverless event-driven architecture.

Architecture Diagram of mAIdAI (Source: Fmind.dev)

The Flow

Frontend : usage of the Google Chat app interface. No custom UI to build or maintain.
Transport : Chat events are delivered via HTTP webhooks.
Backend : A Cloud Run service hosting a FastAPI application processes the events.
Intelligence : The backend connects to Vertex AI (Gemini models) for reasoning, grounded by the personal context file.

Deep Dive: The Code

The implementation is surprisingly minimal, thanks to the Google GenAI SDK and FastAPI. The entire core logic resides in a single main.py file.

1. The Setup

We initialize the GenAI client using standard environment variables. This keeps the code portable and secure.

# main.py
client = genai.Client(
  project=os.environ["GOOGLE_CLOUD_PROJECT"],
  location=os.environ["GOOGLE_CLOUD_LOCATION"],
  vertexai=True,
)
# Loading the Second Brain
MODEL_CONTEXT = (ROOT_FOLDER / "context.md").read_text()
config = types.GenerateContentConfig(
  system_instruction=MODEL_CONTEXT,
  max_output_tokens=5000,
)

By reading context.md at startup and injecting it as the system_instruction, we ensure every interaction is grounded in my specific reality.

2. Handling Interaction Types

The core router handles the distinction between simple commands and AI interactions. This is crucial for latency and cost — not every interaction needs a round-trip to an LLM.

@app.post("/")
async def index(request: Request) -> dict:
    event = await request.json()
    # ... extraction logic ...

    if command_id := app_command_metadata.get("appCommandId"):
        # Handle Slash and Quick Commands
        if command_type == "QUICK_COMMAND":
            return respond(command_text)

        if command_type == "SLASH_COMMAND":
            # Contextualize the prompt
            prompt = f"{command_text}. USER INPUT: {user_input}"
            return respond(await chat(prompt))

    # Fallback to standard chat
    return respond(await chat(user_input))

This pattern allows me to have a /links command that returns immediately (0 latency, 0 cost), while a /rewrite command leverages Gemin 2.0 Flash for creative work.

3. Asynchrony by Default

Using async def and client.aio.models.generate_content ensures the Cloud Run container can handle multiple concurrent requests efficiently, even with a single instance.

Deployment Strategy

Simplicity was the primary constraint. I didn’t want to manage infrastructure for a personal tool.

Runtime : Cloud Run (fully managed, scales to zero, low-cost serving).
Configuration : Environment variables for model selection (gemini-3-flash) and project details.
Security : IAM-based authentication ensures only verified chat events reach the service.

Why Build This Locally?

You might ask, “Why not use a standard consumer AI chat?”

Privacy : Data stays within my Google Cloud project.
Context : I control the system prompt (context.md) explicitly.
Workflow Integration : It lives where I work — in Google Chat — not a separate browser tab.

Conclusion

We often accept friction because “it’s just how things are.” But as engineers, we have the tools to change that. mAIdAI is a proof of concept that a highly personalized, context-aware agent doesn’t require a massive engineering team. It just requires a few hundred lines of Python and the right cloud primitives.

If you find yourself copying the same text or answering the same questions repeatedly, maybe it’s time to build your own assistant.

Source: Gemini App

MLOps Coding Skills: Bridging the Gap Between Specs and Agents

Médéric Hurier (Fmind) — Wed, 28 Jan 2026 20:50:20 +0000

We are entering the golden age of AI Coding. Every day, I see colleagues, both technical and non-technical, marveling at how agents are rewriting the rules of software construction. The promise is intoxicating: describe what you want, and let the machine handle the rest.

However, when I see my colleagues try to apply these agents to strict engineering standards, they hit a wall. On one side, you have rigorous specification tools like spec-kit or conductor. They are deterministic and thorough, but setting them up feels like writing a legal contract. On the other side, you have generic tools like the Model Context Protocol (MCP). They act as incredible “hands” for the AI — reading databases, calling APIs — but they lack the brain for your specific context.

They don’t know that your team enforces uv over poetry. They don’t know you prefer just files for automation. They don’t know your specific flavor of “clean code.”

Then I discovered Agent Skills, and everything clicked.

I was immediately hooked. They offer the specific trade-off I had been looking for: lightweight enough to be flexible, yet opinionated enough to be useful.

Source: Gemini App

In this article, I want to share how I used Agent Skills to turn the theoretical “MLOps Coding Course” into a practical, actionable library: the MLOps Coding Skills project.

The Challenge: Making References Actionable

For the past few months, I’ve been deep in the trenches writing the MLOps Coding Course. It is a comprehensive curriculum teaching production-grade MLOps, from robust project initialization to advanced observability.

But as I wrote the documentation, I felt a friction point. Learning the standards is one thing; remembering to apply them in the heat of coding is another.

I didn’t just want another wiki page. I wanted to make these best practices actionable for valid AI Agents. I wanted to move from “reading the docs” to “installing the capability.”

The Logic: How to “Skillify” Knowledge

The beauty of an Agent Skill lies in its simplicity. It is essentially a markdown file (SKILL.md) that functions as a context injection module. It gives the agent “muscle memory” for a specific topic.

My methodology for building the MLOps Coding Skills repo was straightforward:

Isolate a Chapter : Take a specific section of the course (e.g., Automation or Observability).
Extract Patterns : Use an LLM to distill the generic engineering standards from the educational content.
Standardize : Format it into a SKILL.md that an agent can ingest.

A Concrete Example: Automating Ops

Let’s look at the mlops-automation skill.

In our course, we have strong opinions: we use just for command running and docker for containerization, with very specific layer caching strategies.

Here is what the skill looks like “on the wire”:

# MLOps Automation

## Goal

To elevate the codebase to production standards by adding Task Automation (just), Containerization ([docker](https://www.docker.com/)), CI/CD ([github-actions](https://github.com/features/actions)), and Experiment Tracking ([mlflow](https://mlflow.org/)).

## Instructions

### 1. Task Automation

Replace manual commands with a `justfile`.
1. **Tool** : `just` (modern alternative to Make).
2. **Organization** : Split tasks into `tasks/*.just` modules.
3. **Core Tasks** :
- `check`: Run all linters and tests.
- `package`: Build wheels.

### 2. Containerization

1. **Tool** : `docker`.
2. **Base Image** : Use `ghcr.io/astral-sh/uv:python3.1X-bookworm-slim` for minimal size.

When I load this skill, my agent stops guessing. It doesn’t offer me a Makefile. It doesn’t suggest a bloated Ubuntu image. It acts like a senior engineer who has been on the team for years.

The “Senior Engineer” Injection

This is the killer value proposition.

Most frustrations with AI coding come from a lack of context. We blame the model for being “dumb,” but usually, we just haven’t told it the rules of the house.

By using Agent Skills, you are effectively injecting a Senior Engineer into your chat context. You are giving the agent a “cheat sheet” that forces it to align with your organization’s reality.

I now use these skills for every new project I touch. I don’t spend an hour setting up boilerplate. I load or create a skill, and within minutes, I had a structure that matched my most rigorous standards.

The Friction Points

Of course, no solution is perfect. There are still rough edges in this workflow:

Local-First Friction : Currently, skills often sit in a local .agent/skills folder. It works, but copying them around feels archaic.
The Context Stack : We are seeing a fragmentation of context. We have MCP servers for tools, AGENTS.md for persona, and Skills for tasks. Managing this “Context Stack” is becoming a new engineering discipline.
Integration gaps : I love how the Gemini CLI handles this via extensions, but I’m eager to see this standardized across VS Code Copilot, Cursor, and other IDEs.

Conclusion

Despite the minor friction, Agent Skills are excellent “Low Hanging Fruit” for any engineering team.

The productivity gain is massive. For a few minutes of setup — writing a markdown file — you save hours of correcting boilerplate code and enforcing standards down the line. It bridges the gap between the rigidity of a spec and the chaos of a raw LLM.

If you are tired of fighting your AI to follow your style, stop arguing with it. Give it a Skill.

Check out the full MLOps Coding Skills repository to see the library in action.

Source: Gemini App

Building with A2UI: Extending the Expressiveness of AI Agent Interfaces

Médéric Hurier (Fmind) — Wed, 28 Jan 2026 05:48:33 +0000

In 2026, AI agents have become incredibly smart, yet they are often limited to simple chatbot interfaces. We have engines capable of reasoning, planning, and coding, but we force them to communicate results through text and basic markdown.

To unlock the full potential of agents, we need a better language for them to express themselves. We need agents that can project rich, dynamic, and interactive user interfaces that adapt to the user’s intent.

This is the promise of A2UI (Agent-to-User Interface): a protocol that allows agents to “speak” UI natively.

In my previous article, I explored the landscape of AI UI solutions and explained why A2UI stands out. Now, I wanted to put it to the test. I built Featest, a feature request application designed to be “AI-First.” Here is the story of how it was built, the strengths of the protocols I used, and the architectural patterns that emerged.

Source: Gemini App

The Project: Featest

The origin of this project was a simple request from my Product Manager: “We need a way for users to vote on features.”

I could have built a standard CRUD app. But I saw this as an opportunity. A feature request board is dynamic. Users have vague intents: “I want something like dark mode but for audio.” They want to merge duplicates. They want to see trends, and admins want to automatically tag requests.

This was the perfect scenario to consider an AI-First experience. I didn’t want a static form; I wanted an agent that users could talk to, which enhances the interactions, and uses the right UI tools when needed.

Github Repository : Featest

Suggest new features with AI (Source: Fmind.dev)

The Power Couple: A2UI and A2A

Before diving into the code, it’s critical to understand the two pillars of this architecture.

1. A2UI: Agent-to-User Interface (The Content)

A2UI is a declarative protocol. Instead of an agent writing code (which is risky and error-prone), it streams a structured JSON description of a UI.

Here is what it looks like on the wire. The agent sends SurfaceUpdate events to render components like a card with a button:

{
  "surfaceUpdate": {
    "surfaceId": "main-surface",
    "components": [
      {
        "id": "welcome-card",
        "component": {
          "Card": {
            "child": "welcome-text"
          }
        }
      },
      {
        "id": "welcome-text",
        "component": {
          "Text": {
            "text": { "literalString": "Welcome to Featest" },
            "usageHint": "h1"
          }
        }
      }
    ]
  }
}

2. A2A: Agent-to-Agent Communication (The Transport)

A2A is the transport protocol. It standardizes how agents talk to each other and to clients over HTTP. It handles the handshake, the task lifecycle, and the message passing.

In Featest, the client wraps the user’s intent in an A2A message:

POST /api/agents/feature_request_agent/tasks
Content-Type: application/json

{
  "task_id": "12345",
  "input": {
    "text": "I want to vote for dark mode"
  }
}

Together, they create a universal language. A2A carries the envelope, and A2UI ensures the letter inside contains rich, interactive content, not just text.

Architecture

I designed the system to be modular, using the Google Agent Development Kit (ADK) and the A2A protocol effectively.

Architecture Diagram of the Featest App (Source: Fmind.dev)

The flow is bidirectional and relies on robust open-source packages:

User interacts with the Lit Client , which uses the official @a2ui/lit renderer. It acts as a state machine, processing SurfaceUpdate events to patch the DOM efficiently.
Client sends intent via A2A to the Backend. The A2UIClient wraps the user’s input (text or events) in a standard JSON-RPC envelope.
Backend Agent processes logic and streams back A2UI JSON instructions.
Client renders the UI components dynamically.

The power of this system comes from the Component Schema. Featest supports a rich set of native components defined in schemas.py, ensuring the agent has high-level building blocks rather than raw HTML:

Layout : Row, Column, List, Card, Tabs, Divider, Modal.
Input : Button, CheckBox, TextField, DateTimeInput, MultipleChoice, Slider.
Media : Text, Image, Icon, Video, AudioPlayer.

The AVC Pattern: Agent-View-Controller

The most significant discovery I made while building Featest was an architectural one.

When you start building complex agent apps, you quickly realize that a single agent doing everything (reasoning, database access, UI formatting) is a mess. It’s hard to test and hard to control.

I adopted what I call the Agent-View-Controller (AVC) Pattern — an evolution of Model-View-Controller (MVC) for the agent era.

1. The Controller Agent (The Brain)

This agent handles the business logic. It doesn’t care about pixels. It inputs a user request, decides which tool to use (e.g., vote_feature, add_comment), and outputs structured data.

# agent/agent.py
controller_agent = agents.Agent(
    name="controller_agent",
    model=configs.AGENT_MODEL,
    description="Executes application logic.",
    instruction=prompts.CONTROLLER_INSTRUCTION,
    tools=[
        tools.list_features,
        tools.add_feature,
        tools.upvote_feature,
        tools.add_comment,
        tools.get_feature,
        tools.update_feature,
        tools.delete_feature,

    ],
)

2. The View Agent (The Renderer)

This agent is the designer. It takes the data from the Controller and translates it into A2UI JSON. It cares about layout, typography, and hierarchy.

view_agent = agents.Agent(
    name="view_agent",
    model=configs.AGENT_MODEL,
    description="Formats data into A2UI schema.",
    instruction=prompts.VIEW_INSTRUCTION,
    output_schema=schemas.A2UI,
)

3. The Sequential Pipeline

I chained them together using ADK’s SequentialAgent. This simple composition gave me immense flexibility. I could swap the View Agent to change the entire look and feel of the app without touching a single line of business logic.

root_agent = agents.SequentialAgent(
    name="feature_request_agent",
    description="Handles feature requests from users.",
    sub_agents=[controller_agent, view_agent],
)

Strengths of the Protocol

Working with A2UI revealed several advantages over traditional chatbot approaches.

Leave comment, with a form rendered dynamically with A2UI (Source: Fmind.dev)

1. UI Language

We are used to agents speaking Markdown. Markdown is fantastic for content: paragraphs, lists, and code blocks. But it fails when you need interaction.

If an agent needs to ask for a complex set of preferences, Markdown forces it to ask one question at a time or parse a messy natural language blob. A2UI allows the agent to project a form with validation, sliders for precise values, and date pickers. It elevates the agent from a “writer” to an “interface designer,” matching the right interaction model to the user’s intent.

2. Security by Design

This is the enterprise killer feature. Because A2UI is data , not code, there is no eval() happening on the client. The agent selects from a catalog of safe, pre-built components. You can’t inject malicious scripts via A2UI, making it safe for production environments where “generated code” is a security nightmare.

3. Progressive Rendering

A2UI is designed to be streamed. As the LLM generates the JSON tokens, the UI builds itself on the screen.

First, the container appears.
Then, the title.
Then, the list items one by one.

This makes the application feel incredibly responsive, masking some latency with visible progress.

4. Interoperability

Because the UI is just JSON, the exact same agent response can be rendered natively on the web (via Lit), on mobile (via Flutter), or on iOS (via Swift). You build the agent intelligence once, and it projects natively everywhere.

5. Client Style Control

With A2UI, the agent is responsible for the structure (intent), but the client is completely in control of the style.

The agent says “I need a primary button.” It doesn’t say “I need a blue button with 4px border radius.” This means your application maintains perfect brand consistency. The same agent response can look like a sleek consumer app on Android or a dense dashboard on the web, simply by changing the client-side theme.

Limitations

1. Maturity & Boilerplate

A2UI is a new protocol. Building a custom app from scratch currently requires significant boilerplate code. For now, the best approach is to use packages like Flutter’s GenUI SDK or wait for higher-level integrations from ADK or Gemini Enterprise.

2. Latency vs. Smartness

Another major challenge is latency. Generating UI tokens takes time and money. While streaming and using “Fast Planners” (like I did in Featest) mitigate this, a pure agentic experience will never beat a hand-optimized native app for core, repetitive tasks. The “smartness” of the dynamic UI must outweigh the latency cost — if it doesn’t, just use a static button.

3. Complementary Nature

I also found that not every use case benefits from dynamic UI. For the core voting interaction in Featest, a static, predictable UI was simpler and faster. Where A2UI shines is for augmenting the experience, helping users rationalize features, tag duplicates, or explore trends through conversation. In this project, it’s a powerful complement to a baseline UI, not necessarily a replacement

Conclusion

A2UI is a fantastic protocol, but it’s not a silver bullet.

In my case, I initially thought a pure “AI-First” app would be the ideal experience. However, I learned that for basic, repetitive tasks, it’s simply too slow compared to a traditional interface. The latency of generating UI on the fly doesn’t always pay off.

The ideal approach for this project is a hybrid model : mixing static, highly-optimized UIs for core workflows with dynamic, agentic components for complex, intent-driven tasks. It is up to the programmer to find the best trade-off for each specific use case.

However, for chatbot-focused applications , this solution could be highly valuable. It enables the creation of much richer UIs exactly when needed, allowing the experience to go beyond simple text and adapters.

One thing is sure: there will be more and more agentic features, and A2UI will be a great bridge between agent power and user needs.

Finding the Holy Grail of AI Agent UIs: From AI-Orchestrated Development to A2UI

Médéric Hurier (Fmind) — Sat, 24 Jan 2026 14:46:32 +0000

In my previous article, I argued that the real bottleneck for AI agents is the User Interface (UI). We are stapling rocket engines to bicycles by forcing advanced agents to communicate through basic markdown chatbots.

Since then, I’ve been on a journey to find the solution. I didn’t want just a theoretical answer; I wanted to build it. I explored everything from “AI-Orchestrated Development” to Python wrappers, up to new AI protocols, searching for a scalable way to give Agents a native, rich, and dynamic interface.

I dedicated time to building a concrete implementation to verify my hypotheses. Here is what I found, what failed, and why I believe A2UI is the protocol we’ve been waiting for to solve this problem.

Source: Gemini App

The Exploration: A Graveyard of “Almost” Solutions

My goal was simple: Build a custom frontend for an agent application without spending weeks on boilerplate. I tried multiple approaches, and most of them hit a wall.

1. The “Heavy” Approach: Angular & Flutter

My first instinct was to build a real app. I tried both Angular and Flutter. These are standards for enterprise application development, offering robust ecosystems and pixel-perfect control.

The Result: It works, but at what cost? In 2026, setting up a full frontend project is still painful. You have to configure build tools, set up linters, manage complex state stores (Redux, Bloc), and synchronize data models with your backend. This overhead is acceptable for a static, long-term product like a banking dashboard, but for a dynamic Agent? It’s overkill.

Agents need to be able to transmit their UI and adapt on the fly. Hardcoding a heavy client defeats the purpose of an autonomous agent. If every new agent capability requires a sprint of frontend changes, the agent isn’t truly autonomous. It’s just a backend API with a very expensive chat interface.

2. “AI-Orchestrated Development” (AI-Generated UIs)

I tried what I call “ AI-Orchestrated Development ”: a more structured approach where the AI is front and center in generating application code, popularized in early 2026 by tools like GitHub Spec Kit, Gemini Conductor, or Antigravity. This is distinct from “vibe coding” (using AI intuitively without understanding the output). AI-Orchestrated Development aims for a systematic process where AI handles implementation under developer guidance.

The Verdict: While promising long-term, it still generates lots of code. Code that you have to maintain, test, and debug. And I’m not confident in either maintaining AI-generated codebases or letting AI be the sole responsible party for production systems.

We already spend more time on application maintenance than building. AI-Orchestrated Development risks accelerating this accumulation. We need to reduce the amount of specific code generated, not increase it.

3. HTMX: The Backend-Driven UI

I went back to my roots (PHP/AJAX) and tried HTMX. It’s a productive methodology that keeps logic in one place by streaming HTML fragments from the server.

The Problem: HTMX couples the agent too tightly to a specific visual implementation. If you want to render the same agent response on a mobile app, a web dashboard, and a desktop client, you can’t reuse the HTML stream — you’re locked into one presentation layer.

More fundamentally, HTML is too low-level for an agent to reason about. An agent shouldn’t be worrying about CSS classes, DOM nesting, or accessibility attributes. It should focus on intent and logic, not pixels. Sending declarative data is more efficient, more universal, and can be consumed by different types of clients.

4. Python Wrappers (Streamlit, Gradio, Chainlit)

These are great for prototypes. Tools like Streamlit, Gradio, and Chainlit offer a small code surface and instant deployment.

The Flaw: The “Glue Code” Hell. You inevitably hit a wall where the library doesn’t support the specific interaction or component you need. Maybe you need a custom drag-and-drop interface or a specific data visualization. You lose control over style, and you end up writing hacky workarounds (custom HTML injection, iframe bridges) to connect the agent’s state to the UI components.

They are also not truly dynamic — they are rigid templates filled with data, not fluid interfaces generated by the agent’s needs. You are still building a form; you are just doing it in Python instead of React.

5. Chat Extensions (Slack/Teams/Workspace)

Building into existing workflows seems smart. Why build a new UI when you can just deploy a bot to Slack or Google Chat?

The Limit: It doesn’t scale. You end up building a specific adapter for Slack, another for Teams, another for Google Chat. Each platform has its own proprietary UI kit (Block Kit, Adaptive Cards) with different limitations.

You want to build your agent once and have it project its UI anywhere, not rewrite the presentation layer for every host app. This fragmentation increases the maintenance burden and prevents you from creating a consistent user experience across platforms.

The Epiphany: Separation of Concerns

I realized something fundamental during this process: Everything is disposable.

We shouldn’t be precious about the UI code. We should focus on the declarative side. Just as humans use HTML not because we love drawing pixels, but because we want to say “Here is a link” or “Here is an image,” agents need a high-level language to describe what needs to be shown, not how to draw it.

The Agent should be responsible for the Data and the Logic. The Client should be responsible for the Style and the Rendering.

This separation allows the agent to be “brain-heavy” and “UI-light,” deferring the complex rendering logic to the client, which is what clients are best at.

The Solution: A2UI (Agent-to-User Interface)

Enter A2UI.

I built a demo app using this protocol, and I was genuinely impressed by its elegance. A2UI is a JSONL-based declarative protocol that creates a standard contract between the AI and the user interface.

How it works

Instead of streaming markdown tokens like a traditional LLM, the agent streams structured JSON objects representing UI components.

Source: https://a2ui.org/

The client can use the Lit renderer, Angular renderer, or Flutter renderer to render native components progressively.

Why it wins

Production-Ready at Google: A2UI isn’t vaporware — it’s already integrated into Google products like Opal, Gemini Enterprise, and the Flutter GenUI SDK.
Transport Agnostic: It works over HTTP (via the A2A protocol), WebSockets, or carrier pigeons. The protocol doesn’t care how the JSON gets there.
Progressive Rendering: The UI appears as the agent “thinks” it. Components stream in one by one, making the interface feel alive and responsive, much like text streaming but for rich UI elements.
Framework Agnostic: The client implementation (React, Angular, Lit) decides how a “Card” looks. The agent just says “I need a Card”. This means you can have a “Material Design” client and an “iOS Cupertino” client rendering the exact same agent response natively.
Secure: No arbitrary JavaScript execution. It’s just declarative data, mitigating injection risks. This is critical for enterprise adoption where security reviews block “dynamic code generation.”
LLM-Friendly: Flat, streaming JSON structure designed for easy generation. LLMs can build UIs incrementally without perfect JSON in one shot.

Note: A2UI is currently at v0.8 and still in active development. The protocol has some rough edges, and for production use. The best approach is to wait for native integration in tools like Gemini Enterprise or the Agent Development Kit (ADK).

A2UI vs AG-UI: Two Philosophies

I also looked at AG-UI, another emerging standard in this space.

AG-UI aims to blend the frontend and backend deeply, creating “AI-First” apps from the ground up with a focus on real-time event loops. It’s powerful but requires you to rethink your entire application architecture regarding state synchronization and event handling.
A2UI focuses on extending the capabilities of chat-based interaction to be richer. It’s a bridge that lets agents “speak UI” using standard components. It feels more like an evolution of the chat interface into a command center rather than a complete replacement of the application stack.

I believe A2UI is the scalable path forward for most agent implementations. It respects the separation of concerns and integrates seamlessly with existing systems via protocols like A2A (Agent-to-Agent).

Conclusion: The 2026 Shift

We are moving towards a schism in frontend technology, and it’s happening faster than we think:

Static Apps (the stock): Dashboards, retail sites, and specialized tools. These will still be built with efficient frameworks for speed, precise control, and specific user journeys where the path is known. They represent the bulk of existing applications.
Dynamic Agent Interfaces (the flow): Powered by new protocols like A2UI. These will replace the “Chatbot” with something far more powerful — interactive, component-based, and generated on the fly. This is where the new growth is happening. These interfaces will emerge when the user’s intent is ambiguous or highly variable, like in Agentic Commerce.

I am convinced that 2026 is the year we stop building UIs for agents and start letting agents project their UIs to us. We shouldn’t spend too much time on UI. Let it be personalized by the agent so we can focus on what truly matters: integration and instruction.

In the next article, I will share the source code and a full demo of the application I built using A2UI. Stay tuned!

Source: Gemini App

Architecting the AI Agent Platform: A Definitive Guide

Médéric Hurier (Fmind) — Wed, 10 Dec 2025 07:20:36 +0000

The velocity of Generative AI has been nothing short of relentless. In the span of just 24 months, the industry has shifted paradigms three times. We started with the raw capability of LLMs (the “prompt engineering” era). We quickly moved to RAG (Retrieval-Augmented Generation) to ground those models in enterprise data. Now, we are at the era of AI Agents.

We are no longer asking models to simply talk or retrieve; we are asking them to do. We are building systems capable of reasoning, planning, and executing actions to change the state of the world.

Building a single agent in a notebook is easy. Building a system that serves, secures, and monitors thousands of autonomous agents across an enterprise is an entirely different engineering challenge. To deliver robust solutions with tangible ROI, you cannot rely on scattered Proofs of Concept. You need a factory. You need an AI Agent Platform.

In this guide, I will deconstruct the architecture of a production-grade AI Agent Platform, breaking it down into its system context, containers, and component layers.

Source: Gemini (Nano Banana Pro)

System Context: The PaaS Approach

At its core, the AI Agent Platform is a Platform-as-a-Service (PaaS) designed to build, serve, and expose AI agents.

Unlike AI Agent SaaS solutions — which lock you into a closed ecosystem and a predefined set of integrations — an AI Agent Platform is designed for extensibility and control. SaaS solutions are excellent for quick wins, but they often lack the ability to support custom logic or complex enterprise workflows.

Crucially, an internal AI Agent Platform allows you to enforce SRE (Site Reliability Engineering) practices. If an agent fails, your Ops team can intervene. If an agent attempts an unauthorized action, your Security team has the audit trails to investigate and harden the perimeter.

The platform serves two distinct types of builders:

The Programmer (Code-Based): Engineers requiring power and flexibility.
The Integrator (No/Low-Code): Business analysts requiring speed and ease of configuration.

It must also be accessible to External Systems (Machine-to-Machine) via standard APIs like REST or gRPC. This allows other systems to offload cognitive tasks — like “analyze this log file” or “classify this ticket” — to your agent fleet programmatically.

To function, the AI Agent Platform relies on five high-level systems:

Identity & Access: The gatekeeper for users, agents, and data.
Foundation Models: The cognitive “brain” (reasoning, planning, and instruction following).
Enterprise Apps & APIs: The “hands” of the agent (e.g., Jira, Salesforce, SAP, SQL, …).
Information Systems: The context providers (Operational DBs, Data Lakes, Knowledge Bases).
Cloud Infrastructure: The bedrock providing compute and reliability.

Source: Fmind.dev

The Container Architecture

To manage complexity, we divide the AI Agent Platform into 7 Logical Containers. This separation of concerns is vital for security auditing and independent scaling.

Interaction: The frontend where users meet agents.
Development: The workbench for building and deploying.
Core: The runtime engine that executes logic.
Foundation: The infrastructure abstraction for models and compute.
Information: The data layer managing context.
Observability: The monitoring and evaluation stack.
Trust: The security and governance control plane.

Source: Fmind.dev

Let’s hack through these layers one by one.

1. Interaction

The Interaction layer is the portal. It is where the carbon lifeforms (us) communicate with the silicon.

Source: Fmind.dev

There are three primary ways to expose your agents:

Standard Chatbot: The familiar conversational interface. It is fast to ship and often requires zero frontend skills. However, it is a generic instrument; chat is not always the best interface for complex user experience.
Custom User Interface: Bespoke web or mobile apps. This is where the power lies. As I’ve argued before, the UI is often the real bottleneck for agents. Custom UIs allow for rich interactions, but they come with a “frontend tax” — they are time-consuming to build.
External Channels: Extending the platform to meet users where they are — SMS, Email, Voice, or Slack. This is critical for field workers or remote teams who don’t sit in front of a dashboard all day.

In the future, I expect Generative UI to take over by 2026. This is where the agent generates dynamic interface elements on the fly based on user intent (see Google Research). In the meantime, we must trade between options.

2. Development

This is the factory floor. My experience shows a 50/50 split between developers (code-based) and integrators (no/low code), so your platform must support both paths to avoid limiting speed or flexibility.

Source: Fmind.dev

Code-Based (The Developer Path)

This path is for engineers using frameworks like LangGraph, CrewAI, or Google ADK.

The Stack: Code is versioned in SCM (Git), tested via CI/CD, and deployed as software artifacts.
The Cost: Surprisingly low. With “Model-as-a-Service,” developers can build robust agents on a laptop or Cloud Workstation for pennies per day. You don’t need a local H100 cluster.

No-Code (The Integrator Path)

This path is for business analysts using Visual Builders and iPaaS (Integration Platform as a Service) tools.

The Stack: Visual designers, drag-and-drop workflows, and pre-built connectors for building AI Agents.
The Trade-off: Speed vs. Flexibility. It is the fastest way to prototype and connect to enterprise apps, but visual design can be less robust and more limiting than pure code.

3. Core

The Core is the heartbeat. It houses the Execution Engine , the runtime responsible for the agent’s cognitive loop.

Source: Fmind.dev

The Execution Engine

To be truly autonomous, the runtime needs specific capabilities that ease development:

Session Management: Persisting state across conversational turns.
Memory Bank: Handling short-term context and long-term recall.
Code Sandbox: A secure environment (like a micro VM) where the agent can write and execute code safely to solve math or data problems.

Gateways & Orchestration

You don’t always need a heavy Airflow setup with DAGs, but you do need:

Task Schedulers / Event Buses: To trigger agents asynchronously (e.g., “New Ticket Created” -> “Wake up Triage Agent”).
API Management: Exposing agents via standard Gateways like Apigee or Gravitee.

Standardization is Key: Practitioners are heavily encouraged to adopt standards like MCP (Model Context Protocol) and A2A (Agent-to-Agent) interfaces. Your platform cannot be an island; it must act as a network where your agents can call tools or even other agents to complete complex tasks.

4. Foundation

The Foundation layer is the bedrock of the AI Agent Platforms, providing both Foundation Models and Infrastructure solutions to the agents.

Source: Fmind.dev

Model Strategy

Serving: You will likely mix Model-as-a-Service (Vertex AI, Bedrock, …) for ease of use and scalability, and Custom Model Hosting for specific, fine-tuned, or private models that require more operational effort.
Model Routing: Don’t default to the most expensive model. Use a router to dispatch simple queries to cheaper/faster models and complex reasoning to “smart” models (e.g., Gemini 1.5 Pro, Claude 3.5 Sonnet, GPT-4).
Context Caching: A massive cost saver. Cache system instructions and heavy documents so you aren’t paying to re-tokenize your company handbook on every request.

Infrastructure

Standard cloud primitives apply here. Compute, Blob Storage, and Artifact Management (for abstracting the agent storage of input/output files) are essential. Treat your Agent Infrastructure as Code (IaC) to ensure reproducibility across environments (AWS, GCP, Azure, or on-premise).

5. Information

An agent without data is a hallucination machine. The Information layer feeds the context required for decision-making.

Source: Fmind.dev

Knowledge (Unstructured): Documentation and guidelines stored in shared drives or online websites. These are typically indexed by a RAG Engine or Search Engine to explain how the company works.
Operational (Structured): Transactional data (SQL DBs) required to do work (e.g., update a CRM record). Builders should favor APIs over direct DB access here to ensure business logic integrity.
Data Lake (Analytical): Historical data for insights and decision making. Requires a Semantic Layer and Data Catalog so the agent understands what “Revenue” actually means before running a query.

The Sync Problem: Syncing these systems is painful. Each sync risks data duplication and inconsistency. We are moving toward a convergence of OLAP and OLTP with systems like Google AlloyDB or Databricks Lakebase to eliminate the copy/desync nightmare.

6. Observability

If there is one thing humans must remain in control of, it is supervising the agents.

Source: Fmind.dev

Supervision: The entry point. You need to collect logs, traces, and audit trails. Alerts should notify operators immediately when an agent loops or fails.
Evaluation: The hardest part. You need pipelines where Foundation Models (or humans) review agent traces to score them on metrics such as Factuality, Relevance, and Accuracy.
Billing: FinOps for AI. Track token usage per department. This is especially important for new architectures with less familiar cost sinks.
Analytics: Tracking adoption. Is the agent actually solving tickets, or are people ignoring it? This is key to reporting ROI to stakeholders.

7. Trust

Finally, the Trust layer. Agents are high-leverage tools; without governance, they are a liability that could create havoc.

Source: Fmind.dev

IAM (Identity & Access Management): RBAC is mandatory. Furthermore, Tool Authentication (OIDC/OAuth) ensures the agent only takes actions the user is authorized to take (acting on behalf of the user).
Security: Guardrails. You need to filter content, prevent Prompt Injection (jailbreaks), and detect malicious content before the LLM sees it. Secret Management is also critical to protect secrets like API keys.
Governance: The Registry. You need a central catalog of authorized agents, models, and tools. You don’t want to be hunting through the org chart to find out who built the “Payroll Bot” or who is responsible for a rogue agent. This can extend to a Marketplace for buying assets from other vendors.

Conclusion

Building an AI Agent Platform is not just about stringing together a few API calls. It is about building a scalable, secure, and observable ecosystem where code and reasoning merge to drive real business impacts. I’m really excited to build these powerhouse of automation and intelligence!

Whether you are a developer writing complex orchestration logic or an integrator dragging and dropping workflows, the platform provides the stability you need to move from “demo” to “production”. The challenge will be immense, but if you have the right vision, roadmap and architecture, solutions will appear layer by layer to start addressing your use cases.

Start with the core, secure the trust layer, and never underestimate the importance of observability. The agents are coming — make sure you have the platform to manage them and give them both power and control.

Source: Gemini (Nano Banana Pro)

Powering Up your Agent in Production with ADK, OAuth and Gemini Enterprise

Médéric Hurier (Fmind) — Sat, 01 Nov 2025 17:37:03 +0000

The promise of AI agents is immense productivity gains. But putting them into production can be a tale of two extremes: surprisingly fast or painfully slow.

The difference often hinges on the infrastructure and tooling you choose. If you attempt to build everything from scratch — creating a custom UI, managing complex authentication flows, and setting up observability — development slows down significantly. You spend more time on infrastructure than on the agent logic itself. I recently argued this point in “The Real AI Agent Bottleneck is the Damn UI”.

However, with the right tools, deploying an agent can be remarkably quick.

To demonstrate how to achieve this fast route, we need a practical example. A while ago, I shared a notebook built over lunch to translate Google Slides. It was effective but stuck in a notebook, inaccessible to my teammates.

A hacker tinkering a robot (Source: Gemini App)

This article details the journey of taking that “Slides Translator” and pushing it into production as a secure, scalable agent, leveraging the right stack to bypass the usual bottlenecks. We will focus on using the Agent Development Kit (ADK), OAuth, and Gemini Enterprise.

The full code for this project is available on GitHub: https://github.com/fmind/slides-translator-agent

The Agentic Architecture

To move from a notebook to a production agent, we need an architecture that handles security, execution, and user access robustly.

Architecture of the Slide Translator Agent from local development to production deployment (Source: fmind.dev)

The workflow is structured as follows:

Local Development: The agent logic is developed using the Agent Development Kit (ADK) and tested locally via the ADK Web UI.
Deployment: The agent is deployed to the Vertex AI Agent Engine on Google Cloud Platform.
Production Access: Users interact with the agent through the Gemini Enterprise Web UI.
Execution and Security: The Agent Engine manages the execution. It uses OAuth for secure authorization, interacts with Google APIs (Drive and Slides) on the user’s behalf, and utilizes Gemini Models for the translation.

ADK and the Power of OAuth

The Agent Development Kit (ADK) provides a great set of features to handle everything you need for building agents. In this specific use case, I focused on its ability to handle OAuth , to let the user grant access to their slides and drive.

Overview of Google ADK and Vertex AI Agent Engine (Source: https://cloud.google.com/agent-builder/agent-engine/overview)

In the notebook prototype, authentication relied on local credentials. This is not suitable for a production agent that needs to access the user’s specific files. The agent must act on behalf of the user, requiring their explicit permission.

Why OAuth?

OAuth 2.0 provides excellent security guarantees and granularity. It allows users to grant specific permissions (scopes) without sharing their passwords with the agent. In this case, we need access to the Google Drive API (to copy the presentation) and the Google Slides API (to read and write slide content).

While OAuth is not an easy concept to grasp for newcomers, it’s a key component to provide more security in enterprise applications.

OAuth flow for a tool with Google ADK (Source: https://google.github.io/adk-docs/tools/authentication/)

Configuration

To make this work, an OAuth Client ID must be configured in the Google Cloud Console: https://console.cloud.google.com/auth/clients

Configuration of the OAuth Credentials on Google Cloud: https://console.cloud.google.com/auth/clients

Crucially, we need to define the “Authorized redirect URIs”. The localhost URI is used during local development with the ADK Web UI, and the https://vertexaisearch.cloud.google.com/oauth-redirect URI is used by the Vertex AI Agent Engine in production to securely handle the callback after the user grants consent.

Implementation in ADK

ADK simplifies the OAuth flow significantly. We define the authentication configuration and use decorators to protect the tools that require user credentials.

Here is a snippet demonstrating the core authentication mechanism in the agent code:

"""Authentication for the tools."""

# %% IMPORTS

from fastapi.openapi.models import OAuth2, OAuthFlowAuthorizationCode, OAuthFlows
from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, OAuth2Auth
from google.adk.auth.auth_tool import AuthConfig

from slides_translator_agent import configs

# %% CONFIGS

AUTHORIZATION_URL = "https://accounts.google.com/o/oauth2/auth"
TOKEN_URL = "https://oauth2.googleapis.com/token"
SCOPES = {
    "https://www.googleapis.com/auth/drive": "Google Drive API",
    "https://www.googleapis.com/auth/presentations": "Google Slides API",
}

# %% AUTHENTICATIONS

AUTH_SCHEME = OAuth2(
    flows=OAuthFlows(
        authorizationCode=OAuthFlowAuthorizationCode(
            authorizationUrl=AUTHORIZATION_URL,
            tokenUrl=TOKEN_URL,
            scopes=SCOPES,
        )
    )
)
AUTH_CREDENTIAL = AuthCredential(
    auth_type=AuthCredentialTypes.OAUTH2,
    oauth2=OAuth2Auth(
        client_id=configs.AUTHENTICATION_CLIENT_ID,
        client_secret=configs.AUTHENTICATION_CLIENT_SECRET,
    ),
)
AUTH_CONFIG = AuthConfig(
    auth_scheme=AUTH_SCHEME,
    raw_auth_credential=AUTH_CREDENTIAL,
)

When the translate_presentation tool is invoked, the negotiate_creds function ensures that a valid token exists. If not, ADK automatically pauses the agent execution and initiates the OAuth flow with the user.

"""Tools for the agents."""

import json

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

from slides_translator_agent import auths

def negotiate_creds(tool_context: ToolContext) -> Credentials | dict:
    """Handle the OAuth 2.0 flow to get valid credentials."""
    logger.info("Negotiating credentials using oauth 2.0")
    # Check for cached credentials in the tool state
    if cached_token := tool_context.state.get(configs.TOKEN_CACHE_KEY):
        logger.debug("Found cached token in tool context state")
        if isinstance(cached_token, dict):
            logger.debug("Cached token is a dictionary, treating as AuthCredential.")
            try:
                creds = Credentials.from_authorized_user_info(
                    cached_token, list(auths.SCOPES.keys())
                )
                if creds.valid:
                    logger.debug("Cached credentials are valid, returning credentials")
                    return creds
                if creds.expired and creds.refresh_token:
                    logger.debug("Cached credentials expired, attempting refresh")
                    creds.refresh(Request())
                    tool_context.state[configs.TOKEN_CACHE_KEY] = json.loads(creds.to_json())
                    logger.debug("Credentials refreshed and cached successfully")
                    return creds
            except Exception as error:
                logger.error(f"Error loading/refreshing cached credentials: {error}")
                tool_context.state[configs.TOKEN_CACHE_KEY] = None # reset cache
        elif isinstance(cached_token, str):
            logger.debug("Found raw access token in tool context state.")
            # This creates a temporary credential object from the token
            # Note: This credential will not be refreshed if it expires
            return Credentials(token=cached_token)
        else:
            raise ValueError(
                f"Invalid cached token type. Expected dict or str, got {type(cached_token)}"
            )
    # If no valid cached credentials, check for auth response
    logger.debug("No valid cached token. Checking for auth response")
    if exchanged_creds := tool_context.get_auth_response(auths.AUTH_CONFIG):
        logger.debug("Received auth response, creating credentials")
        auth_scheme = auths.AUTH_CONFIG.auth_scheme
        auth_credential = auths.AUTH_CONFIG.raw_auth_credential
        creds = Credentials(
            token=exchanged_creds.oauth2.access_token,
            refresh_token=exchanged_creds.oauth2.refresh_token,
            token_uri=auth_scheme.flows.authorizationCode.tokenUrl,
            client_id=auth_credential.oauth2.client_id,
            client_secret=auth_credential.oauth2.client_secret,
            scopes=list(auth_scheme.flows.authorizationCode.scopes.keys()),
        )
        tool_context.state[configs.TOKEN_CACHE_KEY] = json.loads(creds.to_json())
        logger.debug("New credentials created and cached successfully")
        return creds
    # If no auth response, initiate auth request
    logger.debug("No credentials available. Requesting user authentication")
    tool_context.request_credential(auths.AUTH_CONFIG)
    logger.info("Awaiting user authentication")
    return {"pending": True, "message": "Awaiting user authentication"}

This ensures the user explicitly consents to the agent accessing their files before any action is taken.

OAuth is supported natively on Google ADK: when needed, ADK will prompt the user to grant more access to the agent tools

Deploying with Gemini Enterprise

Once the agent is developed and tested, the next step is deploying it to production.

Configuring Production Authentication

Before deploying the agent code, we need to register the OAuth configuration with the production environment. I used the following script to set this up:

./as.py create-auth \
  --auth-id slides-translator-auth \
  --client-id ... \
  --client-secret ... \
  --auth-uri "https://accounts.google.com/o/oauth2/auth?include_granted_scopes=true&response_type=code&access_type=offline&prompt=consent" \
  --token-uri "https://oauth2.googleapis.com/token" \
  --scope "https://www.googleapis.com/auth/drive" \
  --scope "https://www.googleapis.com/auth/presentations"

This command links the slides-translator-auth ID (referenced in the Python code above as configs.TOKEN_CACHE_KEY) with the actual Client ID, Secret, and the required scopes.

Note: As the Gemini Enterprise exposition API is still in private preview, I can’t share more details nor the deployment script yet.

Seamless Exposition

Gemini Enterprise gives you a quick way to expose your agent securely and conveniently. This directly addresses the “UI bottleneck” mentioned earlier.

This approach has significant advantages over deploying a separate UI (like Streamlit):

Zero-Effort UI: No need to design, host, or secure a separate frontend application.
Observability: Thanks to the underlying Agent Engine, it traces and logs the agent information automatically, providing essential observability for production monitoring and debugging.
Core Services: It provides more core services and integrates seamlessly within the Google Cloud security perimeter.

The end result is a clean, integrated experience. Users can interact with the “Slides Translator Agent” directly within the Gemini interface.

Slides Translator Agent deployed on Gemini Enterprise

Conclusion

This journey from a simple notebook to a production-ready agent was a great experience to see what this stack provides out of the box. The combination of ADK for development, OAuth for security, and Gemini Enterprise for deployment streamlines the entire lifecycle of an enterprise agent, allowing us to deploy quickly without compromising on security or usability.

I’m eager to explore more ways to build agents. While this is a new paradigm that requires upskilling our teammates and adapting our development practices, we already see the potential from the use cases we see. The ability to rapidly deploy secure, specialized tools that act on behalf of users is a significant step forward.

Human and Agent merging to accomplish their tasks (Source: Gemini App)

The Real AI Agent Bottleneck is the Damn UI

Médéric Hurier (Fmind) — Sun, 12 Oct 2025 16:12:41 +0000

We’re living in the golden age of AI agent development. The backend infrastructure is finally catching up to the hype. If you’ve followed my previous work on deploying agents using ADK and Google Cloud, you know that the heavy lifting — the orchestration, the tool integration, the deployment pipelines — is becoming standardized.

The major players are all in. Whether you’re using Google Cloud’s Vertex AI Agent Engine powered by the ADK, AWS AgentCore with Strands, or Databricks’ AgentBricks, building the brain of the agent is easier than ever. But here’s the dirty secret the hype cycle isn’t talking about: The User Interface (UI) is the real bottleneck for industrializing AI agents.

You can have the most sophisticated, multi-step reasoning agent on the planet, but if your users can’t interact with it intuitively, securely, and effectively, it’s a challenge for deploying AI agents at scale. The last mile — exposing the agent to maximize impact — is where projects go to die. In this article, we are going to explore this problem, and find the best trade-off to remove the bottlenecks and adopt AI agents full-throttle in your company!

Source: Google AI Studio

The Bottleneck: Why UI is the Hardest Part

Building an agent requires a specific skill set: LLM understanding, backend engineering, and prompt whispering. Building a good UI requires a completely different one: frontend development, UX design, and product sense. The engineers hacking together these agents are rarely UI experts. And frankly, they shouldn’t have to be.

The Tiresome Process of Building UIs

Having to spin up a new React app every time you deploy an agent is soul-crushing. It’s tedious, time-consuming, and completely unscalable. We need generalized interfaces that adapt to specific workflows, not custom code for every use case. This includes generalizing how we evaluate agent performance and collect user feedback — critical components that are often considered as an afterthought.

The Identity Crisis

User and Agent Identity is paramount. If an agent needs to access a database or pull a file from Google Drive, it must do so on the user’s behalf, from the UI . We can’t have agents authenticating with god-mode service accounts, nor can we force users to re-authenticate with every single tool during an interaction. The UI must seamlessly handle delegated authority.

Security and Governance: The Enterprise Non-Negotiables

This isn’t a weekend hackathon project. In an enterprise setting, security is everything. You cannot allow loose access controls. The nightmare scenario? An agent with access to your entire data lake and the ability to send emails externally. The risk of data leakage is massive.

Governance requires auditing every operation, ensuring data usage is controlled, and verifying that tool access is restricted. The UI is the gateway for all of this. This technical juggle requires both admins and users to be familiar with the environment, but their interfaces must be optimized for their roles.

Symptoms of a Broken System

When the UI layer fails, the organization feels the pain.

The Rise of Shadow IT: When official tools are too hard to use or deploy, users find workarounds. We see a proliferation of quick-and-dirty solutions, like rogue n8n instances deployed under someone’s desk, creating massive security vulnerabilities.
Agent Silos: Agents should be collaborative. They need to interact with each other, leveraging protocols like the emerging A2A (Agent-to-Agent) standard. But when agents live in isolated, collaboration is impossible. They become siloed tools rather than a cohesive intelligence layer.
The “90% Done” Fallacy: This is the classic trap. You hack together a Streamlit web app, deploy it, and declare the project 90% complete. Wrong. The real project — adoption, integration, security hardening, and UI refinement — is just beginning.

Exploring the Approaches: The Good, The Bad, and The Ugly

How are we currently trying to solve this UI challenge? Let’s break down the dominant paradigms.

1. The Pure Chatbot (The Terminal Approach)

The idea here is that the chatbot is the only interface. We see this clearly with the recent OpenAI Apps or Gemini Extensions.

OpenAI Apps. Source: https://www.axios.com/2025/10/06/openai-chatgpt-app-devday

Pros: Simple, universal interface for everything. Low development overhead.
Cons: Incredibly restrictive. Markdown syntax is not a UI framework. You can’t easily implement sliders, interactive maps, complex data visualizations, or rich editing tools.
The Verdict: This is like a developer terminal, but using natural language instead of Linux commands. It’s powerful for certain tasks but hits a wall quickly when complexity increases.

2. The Co-Pilot (The Sidecar Approach)

The chatbot controls another, existing interface. The prime example is Gemini for Workspace, where the chatbot sits as a widget on the right side of Docs, Sheets, or Gmail.

Gemini for Workspace with Chat Sidecar on Google Sheets

Pros: Meets the user where they already are. Keeps the familiar interface of the host application.
Cons: Limited to the capabilities of the host app. Cross-application workflows (e.g., “Analyze this spreadsheet and draft a presentation based on the findings”) are difficult or impossible.
The Verdict: A great enhancement for existing tools, but not a solution for complex, multi-tool agentic workflows.

3. Generic Static UI (The Visual Workflow)

This involves using a predefined visual interface, like n8n or specialized agent builders.

Example of Visual Workflow on n8n. Source: https://n8n.io/

Pros: Generic yet adaptable to specific workflows. Fast to develop and easy to interpret.
Cons: Visual workflows are often legacy techniques poorly suited for Generative AI. They are too rigid. How do you easily put a human in the loop? How do you give the agent more autonomy when the path is predefined?
The Verdict: Good for traditional automation, but stifles the potential of true AI agents.

4. Specific Static UI (The Artisanal Approach)

Building a custom, bespoke UI for every agent, often using frameworks like Genkit.

Source: https://developers.googleblog.com/en/how-firebase-genkit-helped-add-ai-to-our-compass-app/

Pros: The absolute best adaptation to the specific use case. Maximum control over the user experience.
Cons: Slow to develop, expensive, and completely unscalable, especially for quick agents.
The Verdict: Necessary for flagship products, but impossible for the rapid deployment of specialized agents.

5. Dynamic UI (The Shape-Shifter)

The UI is fluid and generated on the spot by the AI itself. We see this with Claude generating artifacts, or experimental concepts like Google’s Opal and the AG-UI protocol.

Dynamic UI with Opal (green block). Source: https://blog.google/technology/google-labs/opal-expansion/

Pros: No need to code UI anymore. Maximum adaptation to the desired workflow. Incredibly fast development cycle.
Cons: Unpredictable and inconsistent. Not efficient — it feels like “vibe coding.” It’s feasible for small apps, but is it robust enough for large-scale enterprise applications?
The Verdict: The holy grail, but the technology isn’t mature enough for mission-critical applications.

Summary of the UI Approaches for AI Agents

The Agent Hub Imperative

Regardless of the UI paradigm we choose, one thing is clear: we need an Agent Hub. Organizations need a centralized location to discover available agents, manage their access, orchestrate their interactions (both human-to-agent and agent-to-agent), and provide governance oversight.

The Current Landscape: Evaluating the Options

Where do today’s solutions fit in?

n8n / OpenAI Agent Builder (Visual Workflow): Familiar with organizations, which aids adoption. However, they are fundamentally restrictive and don’t allow for the autonomy and human-in-the-loop interaction that GenAI agents can leverage.
OpenAI Apps / Gemini Extensions (Chat-First): The easy fix, but they lack expressiveness. If we limit agents to simple chat interfaces, we risk repeating the failures of Alexa — useful for timers, but not for complex work.
Opal / AG-UI (Dynamic UI): Great for small, isolated apps and user autonomy, but not scalable for large, complex systems. They are hard to edit, maintain, and ensure consistency.
AWS QuickSuite (Hybrid): A pragmatic, conservative middle ground. QuickSuite offers a toolset of GenAI variants with UIs tailored for specific tasks like data analysis, deep research, or conversation. A solid, choice, especially if you are using AWS services.

AWS Quic Suite with several experiences: Chat Agents, Flows, and Research

Gemini Enterprise (Agent Hub Focus): Gemini Enterprise shows potential as a central hub, but it needs to deliver richer expressiveness beyond the standard chat interface to truly unlock agent potential. One solution is to control other UI (e.g., Google Sheets) from the chat app.

Agent Gallery and Usage from Gemini Enterprise. Source: https://cloud.google.com/gemini-enterprise?hl=en

My Bet on the Future

The UI bottleneck won’t be solved overnight. Here’s where I see things heading.

Short/Medium Term: The “Hacker Terminal” Wins

For the immediate future, the Chatbot UI will dominate. It’s the easiest to develop and gets you 80% of the way there. It’s the “hacker terminal” approach — using natural language to orchestrate complex systems — but easier to use. In addition, visual workflows will be used for deterministic applications (i.e., agentic workflows) as a complementary solution.

The key to making this work won’t be richer UIs, but better backend collaboration. Agents need to be able to seamlessly call other agents (A2A) behind the scenes, using the chat interface purely as the command and control layer.

Long Term: Ambient Computing and Voice

In the long term, the best UI is no UI. We will move towards voice and ambient computing. We will keep our existing human applications (our spreadsheets, our design tools, our CRMs), and agents will pilot them intelligently on our behalf.

This is both easier to develop (no new UIs needed) and easier to adopt (users keep their existing workflows). However, this requires incredibly robust models and rigorous testing. We only adopt transformative interfaces when they are near-perfect. Think about voice translation — it only became truly useful when it crossed the 95% accuracy threshold. Ambient computing will require the same level of reliability.

Until then, we need to stop treating the UI as an afterthought. It’s a critical component for unlocking the value of AI agents in the enterprise. It’s time we started engineering it with the same rigor we apply to the agents themselves.

Photo by Anton Filatov on Unsplash

Da2a: The Future of Data Platforms is Agentic, Distributed, and Collaborative

Médéric Hurier (Fmind) — Sat, 27 Sep 2025 13:36:40 +0000

For decades, the story of data platforms has been one of centralization and heavy engineering. We built massive data warehouses and data lakes, but accessing their insights required deep technical expertise. Business users couldn’t simply ask questions; they had to navigate a complex process involving specialized data engineers to build painstaking ETL pipelines, optimized queries, and specific dashboards. This highly technical approach created a rigid, monolithic source of truth that, while powerful, was slow to adapt and created significant bottlenecks. It left decision-makers waiting days or even weeks for answers, completely dependent on an over-burdened engineering team.

Illustration of the complexity of data platforms (Source: Gemini App)

What if we flipped the model on its head?

Instead of a single, all-knowing monolith, imagine a collaborative ecosystem where domain experts describe their data in natural language, providing context that empowers a network of intelligent, autonomous agents. Each agent becomes an expert in its domain — sales, marketing, logistics, finance — managing its own data by combining human-provided descriptions with its own skills to answer questions. This is the future of data platforms: a system that is agentic, distributed, and truly collaborative.

I created a new open-source project, Da2a, to explore this paradigm. It’s a prototype that demonstrates how a multi-agent system can tackle complex data analysis by working together.

The Old Way vs. The New Paradigm

The traditional data platform is engineering-focused. The primary challenge is moving, storing, and modeling data. Answering a simple business question like, “What’s the ROI on our latest social media campaign?” could involve:

Filing a ticket with the data engineering team.
Waiting for them to build a new pipeline to join marketing spend data with sales data.
Having an analyst write a complex SQL query across multiple massive tables.
Finally, getting a report back, hoping the initial question hasn’t become irrelevant.

The agentic approach is insight-focused. Instead of a centralized database, you have specialized agents. For instance, the Marketing Agent knows everything about campaign spending and lead acquisition. On the other hand, the E-commerce Agent is an expert on orders, products, and revenue.

To answer that same question, you simply ask a root “Orchestrator Agent.” The orchestrator understands the goal, formulates a plan, and collaborates with the specialist agents to get the answer. The focus shifts from the how (engineering) to the what (the business question).

Meet Da2a: An Agentic Platform in Action

Da2a implements this vision with a root orchestrator and two specialized agents: one for an e-commerce dataset and another for a marketing dataset, both based on real-world data from the Olist store in Brazil.

GitHub Repository: https://github.com/fmind/da2a

Live Demos:

Root Orchestrator: https://da2a.fmind.dev/
Marketing Agent: https://da2a-marketing.fmind.dev/
E-commerce Agent: https://da2a-ecommerce.fmind.dev/

You can ask the e-commerce agent, “How many orders were placed in São Paulo?” or the marketing agent, “What were our top lead sources last year?”. Better yet, you can ask the root orchestrator a question that requires both, like, “What is the total sales revenue from sellers who were acquired via ‘Display’ advertising?”

Screenshot of the Da2a User Interface: http://da2a.fmind.dev/

The root agent intelligently delegates the work: first asking the marketing agent to identify the sellers from the ‘Display’ channel, then passing that list to the e-commerce agent to calculate their total sales.

The Architecture: Collaboration via the A2A Protocol

The magic that makes this collaboration possible is the Agent-to-Agent (A2A) protocol. A2A provides a standardized way for agents to communicate their capabilities and call upon each other’s skills over a network.

Architecture of the Da2a Application, with Marketing and E-Commerce agents collaborating with the Root Agent

The architecture consists of:

A Root Agent : The orchestrator that receives user requests, plans the execution, and delegates tasks.
Domain Agents: The ecommerce_agent and marketing_agent, each running as an independent service with its own database.
Agent Cards : Each domain agent exposes a JSON “agent card” that acts like a digital business card, describing its name, capabilities, and how to communicate with it.

The root agent is configured to know about these remote agents. Here is a simplified look at the code from the da2a agent.py file, which sets up the connection to the remote agents using their "agent cards."

import google.adk.agents.remote_a2a_agent as a2a
import google.adk.tools.agent_tool as at

# The URL points to the 'agent card' of the remote agent
AGENT_CARD_ECOMMERCE = "[https://da2a-ecommerce.fmind.dev/a2a/ecommerce/.well-known/agent-card.json](https://da2a-ecommerce.fmind.dev/a2a/ecommerce/.well-known/agent-card.json)"
AGENT_CARD_MARKETING = "[https://da2a-marketing.fmind.dev/a2a/marketing/.well-known/agent-card.json](https://da2a-marketing.fmind.dev/a2a/marketing/.well-known/agent-card.json)"

# Create local proxy objects for the remote agents
ecommerce_agent = a2a.RemoteA2aAgent(
    name="ecommerce_agent",
    agent_card=AGENT_CARD_ECOMMERCE,
    description="Answers questions about e-commerce data..."
)
marketing_agent = a2a.RemoteA2aAgent(
    name="marketing_agent",
    agent_card=AGENT_CARD_MARKETING,
    description="Answers questions about marketing data..."
)

# The root agent uses these agents as 'tools' to solve problems
root_agent = LlmAgent(
    ...
    tools=[at.AgentTool(ecommerce_agent), at.AgentTool(marketing_agent)],
    ...
)

Each domain agent is served via the Agent Development Kit’s (ADK) web server, which automatically exposes the A2A endpoints and the agent card.

# Command to serve an agent and enable A2A communication
adk web --a2a

This simple, powerful mechanism allows us to build a distributed system where components can be developed, deployed, and scaled independently.

The Benefits of Thinking Agentically

This approach unlocks several powerful advantages:

Human-Like Task Handling: Agents can tackle complex, multi-step tasks that require synthesizing information from different domains, much like a human analyst would.
Scalability and Extensibility: Adding a new data domain is as simple as building and deploying a new agent. No need to re-architect the entire platform. The system grows organically.
Focus on High-Level Value: It abstracts the underlying engineering complexity. Data consumers and developers can focus on defining business logic and asking high-level questions, not on writing SQL or managing data pipelines.
Autonomous and Collaborative: Each agent is a valuable tool on its own, but their true power is unlocked when they collaborate through an orchestrator to solve problems that no single agent could handle alone.

The Road Ahead: Limitations and Future Work

Da2a is a prototype, and building an industrial-grade agentic data platform requires solving some interesting challenges:

Efficient Data Transfer: A2A is excellent for orchestrating tasks and passing small payloads of text or JSON. It is not designed for transferring gigabytes of data between agents. For that, we’d need to integrate mechanisms that point agents to shared data storage.
Dynamic Agent Discovery: Currently, the root agent’s knowledge of other agents is hardcoded. A production system would need a discovery service or a registry where agents can dynamically register themselves and their skills.
Memory and Learning: The agents in this prototype are stateless. The next frontier is to give them memory, allowing them to learn from past interactions, recall previous results, and improve their planning and execution over time.

Conclusion: A New Frontier for Data

The agentic paradigm represents a fundamental shift in how we think about data architecture. We are moving from rigid, centralized systems to dynamic, decentralized ecosystems of intelligent specialists. This approach promises to create data platforms that are more flexible, more powerful, and more aligned with the way businesses actually work.

There is still much to build, but the potential is immense. The future of data isn’t just about bigger databases or faster queries; it’s about collaboration, intelligence, and a network of agents working together to turn data into insight.

The future is with Agentic Data Platforms (Source: Gemini App)

Ackgent: Rapid Agent Development on GCP with ADK and Agent Config

Médéric Hurier (Fmind) — Sun, 14 Sep 2025 13:13:04 +0000

The AI agent landscape is exploding, but development speed is hitting a wall. We need a faster, more accessible way to build and iterate. In my current role, I spend my days optimizing the experience of building and deploying AI agents. I’ve witnessed firsthand the incredible use cases my customers’ developers — agents that streamline complex workflows, automate intricate decision-making, and unlock new data insights. The potential is massive, but the reality of development is often friction-filled.

Despite the advancements in foundational models, the process of taking an agent from concept to production remains too slow and overly complex. Developers get bogged down in boilerplate code, infrastructure wrangling, and the mechanics of tool integration, rather than focusing on the actual logic and value the agent provides. We drastically need to improve the speed at which we can iterate, while simultaneously making the whole process more accessible to a broader range of builders. Enter Ackgent, a demonstration of how Google ADK and Agent Config can be use to quickly build and deploy AI agents with a declarative approach.

Source: Gemini App

The Shift from Imperative to Declarative

The traditional approach to building agents is largely imperative. You write Python (or similar) code detailing exactly how the agent should execute tasks, manage state, call tools, and handle errors. This offers maximum control but comes at the cost of speed and simplicity.

What if we could shift to a declarative approach? What if we could define what the agent should do, and let a robust framework handle the execution? This is the promise of Agent Config, a new feature of Agent Developer Kit (ADK) introduced in the release v.1.12.0.

Agent Config allows developers to define the entire behavior of an agent — its goals, instructions, tools, and integrations — using a structured configuration file written in YAML. This addresses both the need for speed and the need for accessibility.

The central insight here is that config helps you focus on the use case, not the code. By abstracting the underlying mechanics, developers, prompt engineers, and product managers can rapidly prototype and test different agent behaviors simply by editing a YAML file.

Flexibility Without the Boilerplate

Crucially, adopting a declarative approach doesn’t mean sacrificing power or flexibility. Agent Config is designed to be extensible. While the core orchestration is handled by the framework, it provides clear pathways for integrating essential components:

External Tools: You can easily connect your agents to real-world APIs, databases, and services.
Callbacks: Hooks are available to inject custom Python logic at specific points in the agent lifecycle (e.g., for pre-processing input, validating output, logging, or monitoring).
MCP (Multi-agent Communication Protocol) Servers: Agent Config supports integration with MCP servers, enabling sophisticated communication, governance, and orchestration in complex multi-agent systems.

# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
agent_class: LlmAgent
model: gemini-2.5-flash
name: prime_agent
description: Handles checking if numbers are prime.
instruction: |
  You are responsible for checking whether numbers are prime.
  When asked to check primes, you must call the check_prime tool with a list of integers.
  Never attempt to determine prime numbers manually.
  Return the prime number results to the root agent.
tools:
  - name: ma_llm.check_prime

Introducing Ackgent: The Agent Config Starter Kit

To help teams adopt this powerful paradigm, I’ve created a new GitHub repository: Ackgent. This repository a demonstration on how to leverage ADK Agent Config within a modern, production-ready Python environment.

Web Inteface of Ackgent with the Internet and Datetime agents: https://ackgent.fmind.dev/

This template encapsulates best practices for structuring a project where configuration is the core, supported by a suite of modern development tools.

Repository Features: A Modern Stack

The Ackgent repository is built with efficiency, robustness, and Developer Experience (DX) in mind:

Modern Python Management with uv : We leverage uv (the blazing-fast Python package manager written in Rust) to streamline dependency resolution and virtual environment management, significantly speeding up setup and CI/CD pipelines.
Task Execution with just : just serves as a convenient command runner, simplifying common tasks like installing dependencies with just project or deploying to the cloud with just deploy.
Code Quality Tooling: Integrated pre-commit hooks ensure code quality and consistency from the start, utilizing formatters and linters like check-toml , check-yaml , or check-json .
evalset Configuration: The repository includes ADK capability for defining and running evaluation datasets (evalset), crucial for rigorously testing and benchmarking agent performance iteratively.
Customizable Cloud Run Deployment: Designed for scalability, the example includes configurations and just recipes for deploying the agents as serverless containers on Google Cloud Run.
Advanced Agent Capabilities: Demonstrations of Tools , Callbacks , and MCP integration within the Agent Config framework.

Ackgent on Cloud Run gives you access to key metrics, logs, SLOs and Error to better observe your agent in production

Architecture Overview

The Ackgent example utilizes a modular architecture centered around the ADK framework. The core concept is the separation of concerns: the agent behavior (the “what”) is defined in YAML files, while the implementations (the “how” — tools, callbacks, and MCP connections) are written in Python.

The ADK framework acts as the runtime engine. It parses the Agent Config YAML, initializes the specified LLM, and orchestrates the flow of conversation. When a request is received (e.g., via the Cloud Run endpoint), the runtime identifies the target agent. When the LLM decides to use a tool or delegate to another agent, ADK handles the execution via the implementation references provided in the configuration.

This separation of concerns — behavior in YAML, execution handled by ADK, and specialized logic in Python — is what enables rapid iteration.

Architecture of the Ackgent example with 3 Agents: Root (Dispatcher), Datetime (Tools), and Internet (MCP)

Under the Hood: Defining Agents with YAML

Let’s look at how this works in practice. The Ackgent repository showcases three distinct agents, demonstrating the core capabilities of the Agent Config approach. Notice how minimal the Python code is, focusing mainly on tool implementation, while the behavior is entirely in YAML.

1. The Datetime Agent (Custom Tools)

The datetime agent demonstrates how to extend an agent with external tools tools. The agent can access the current date and time defined in the tools.py of the repository, which are defined as simple functions.

# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: datetime_agent
model: gemini-2.5-flash
description: A helpful assistant for datetime questions.
instruction: Return the current date or time based on the user's request.
generate_content_config:
  temperature: 0.0
tools:
  - name: agent.tools.now
  - name: agent.tools.today

"""Tools for agents."""

# %% IMPORTS

import datetime

# %% TOOLS

def now() -> str:
    """Returns the current time.

    Returns:
        str: The current time in 'HH:MM' format.
    """
    return datetime.datetime.now().strftime("%H:%M")

def today() -> str:
    """Returns the current date.

    Returns:
        str: The current date in 'YYYY-MM-DD' format.
    """
    return str(datetime.date.today())

2. The Internet Agent (Search Tools)

The Internet agent is configured to access an external MCP Server. In this case, we are using markdown-mcp, a server developed by Microsoft to quickly retrieve any source into a markdown, including external links. The MCP is started as a STDIO server, with a timeout of 10 seconds.

# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: internet_agent
model: gemini-2.5-flash
description: A helpful assistant for answering questions from the Internet.
instruction: Return the answer to questions using the user provided link.
generate_content_config:
  temperature: 0.0
tools:
- name: MCPToolset
  args:
    stdio_connection_params:
      server_params:
        command: "markitdown-mcp"
      timeout: 10

3. The Root Agent (Coordination and Routing)

The root agent acts as the main entry point. It doesn't perform tasks itself; instead, its primary function is orchestration. It analyzes the user's intent and intelligently delegates the task to the most appropriate specialized agent using the sub_agents configuration. This pattern enables a scalable and modular multi-agent system.

# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: root_agent
model: gemini-2.5-flash
description: A helpful assistant for user questions.
instruction: |
  You are a helpful assistant that can answer questions about anything.
  Use the following sub-agents to answer questions: `datetime_agent` and `internet_agent`.
generate_content_config:
  temperature: 0.0
after_model_callbacks:
  - name: agent.callbacks.after_model_callback
sub_agents:
  - config_path: datetime_agent.yaml
  - config_path: internet_agent.yaml

Current Limitations

While Agent Config is a great helper for quickly building agents, it’s important to be aware of the current constraints within the ADK framework as it evolves.

One notable limitation today involves mixing different types of capabilities within a single agent definition. Currently, you cannot configure an agent that simultaneously uses “Built-In Search Tools” with google_search or VertexAiSearchTool alongside "Non-Search Tools" (like the custom Python functions) or "Sub-Agents" (like the root agent uses).

tools: # multiple tools are supported only when they are all search tools!
  - name: google_search
  - name: VertexAiSearchTool
   args:
     data_store_id: "projects/ackgent/locations/us/collections/default_collection/dataStores/reports_123..."

The ADK team is actively working on enhancing this flexibility. For now, the recommended architecture — as demonstrated in the Ackgent repository — is either to separate concerns into specialized agents, or create custom search tools (e.g., like the markitdown-mcp server).

The Future: Democratizing Agent Creation

ADK Agent Config is more than just a feature; it’s a foundational shift in agent development from imperative to declarative.

Based on Agent Config, Ackgent offers an immediate boost in productivity. It streamlines the development lifecycle, reduces boilerplate, and makes testing and deployment significantly faster. This template repository provides a concrete starting point to leverage these benefits today.

But the long-term vision is even more exciting. Because the agent’s behavior is defined declaratively in a structured, human-readable format (YAML), it opens the door for non-technical users — what we might call “digital users” — to build their own agents. Imagine a future where a UI allows business analysts or domain experts to visually construct complex agents by defining instructions and plugging in tools — all powered by Agent Config under the hood.

We are moving towards a future where the ability to create AI agents is truly democratized. I encourage you to explore the repository, try out the examples, and experience the speed and simplicity of declarative agent development.

Link to GitHub Repository : https://github.com/fmind/ackgent
Link to Web Demo : https://ackgent.fmind.dev/

Source: Gemini App

Combo-Banana: Building Custom Image Workflows in Record Time

Médéric Hurier (Fmind) — Mon, 08 Sep 2025 19:03:39 +0000

In the fast-paced world of product retail, agility is crucial for the teams bringing products to market. Product designers at my customer handle a massive volume of images daily. Ensuring every product looks perfect across the website, mobile apps, and marketing campaigns often involves tedious, multi-step editing processes — background removal, resizing, color correction, and optimization.

While essential, these repetitive tasks can consume hours, diverting designers from the creative work they do best. What if designers could automate these specific workflows themselves, without wrestling with complex software or waiting for engineering resources?

Source: Nano Banana

This challenge inspired a recent project: Combo-Banana. A simple open-source prototype based on Google's Nano Banana designed to demonstrate just how quickly we can build applications that deliver immediate value to our teammates on the field. This project is about empowering designers to create their own multi-step image editing pipelines.

The Use Case: Beyond Manual Editing

Imagine a designer preparing images for a new product line. The workflow is predictable but labor-intensive:

Receive raw photos from the studio.
Manually isolate the product from the background.
Adjust the lighting and contrast to meet brand guidelines.
Resize and crop for the product detail page (high resolution).
Integrate the products in several situations (e.g., on a user, in a store).

When done manually across hundreds of SKUs, this process is slow and prone to inconsistencies.

This prototype reimagines that process. Instead of a series of manual actions across different tools, the designer defines a “combo” — a sequence of operations executed automatically by the application.

{
    "name": "Social Media Ad Creation",
    "steps": [
        {
            "title": "Place Item in Landscape",
            "prompt": "Integrate the product or item seamlessly into a visually stunning and appropriate landscape background, ensuring realistic lighting and perspective."
        },
        {
            "title": "Add Catchy Slogan",
            "prompt": "Overlay a concise and catchy slogan onto the image, using a font and placement that enhances readability and visual appeal for a social media ad."
        }
    ]
}

The Experience: Flexibility Meets Simplicity

The prototype focuses on a streamlined experience. A user can upload an image and stack the desired operations. They define the recipe once — e.g., Step 1: Isolate Product; Step 2: Improve the Shadows; Step 3: Add a Slogan — and the application handles the rest.

This transforms a 15-minute manual task into a 30-second automated process, ensuring pixel-perfect consistency across the entire product catalog and freeing up time for more creative work.

See it in Action

The prototype illustrates how an intuitive interface can abstract away the complexity running in the background.

You can explore the live demo here: https://combo-banana.fmind.dev/

Combo-Banana: Workflow Definition Tab

On the left, the user defines the workflow with a chatbot interface based on Gemini 2.5 Flash. The chatbot extracts prompts into a series of steps that are stacked sequentially. In this example, we start with a “Place the item in a landscape” step, followed by a “Add Catchy Slogan” step, powered by Nano Banana.

Combo-Banana: Workflow Definition Tab

Once the desired “combo” is configured, the user simply uploads the source image on the top left side of the second tab. The application processes the image through the defined pipeline — the output of the first step becomes the input for the next. The final result is displayed on the right, ready for download. This visual feedback loop allows designers to quickly iterate on their workflows before applying them to large batches of images.

Final Result of the User Combo

Under the Hood: The Tech Stack

The speed of development was thanks to a modern, efficient tech stack. We focused on rapid prototyping, leveraging powerful AI, and ensuring scalability:

Architecture of Combo-Banana

The Interface: Gradio Used to build the interactive web UI entirely in Python, avoiding the need for complex front-end development and significantly speeding up iteration.
The Backend: Python The backbone of the application, handling core logic and orchestrating the sequence of image processing steps.
The Engine: Nano Banana The AI powerhouse driving complex tasks like high-fidelity background removal and segmentation. This project was a fantastic opportunity to leverage its impressive capabilities. In future releases, other models could with combined with Nano-Banana.
Deployment: Google Cloud Run A serverless platform ensuring the tool is accessible, cost-effective (scales to zero), and scalable on demand within an organization’s infrastructure.

The Road Ahead: From Prototype to Platform

This prototype is just the beginning. The goal is to evolve it into a robust platform that can handle the complexity of real-world production environments. Key opportunities for evolution include:

Advanced Workflows (DAGs): Moving beyond simple sequential pipelines (Step A -> Step B -> Step C) to support Directed Acyclic Graphs (DAGs). This would allow for parallel processing — for example, generating five different resolutions simultaneously after the background has been removed.
Granular Configuration: Providing deeper configuration options within each processing block (e.g., setting specific compression levels, defining padding for auto-crops, or choosing different AI models for specific tasks and which previous image to use).
Ecosystem Integration: Integrating directly with existing asset management tools. This includes pulling source files from Google Drive and automatically exporting the results to designated folders or downstream systems.
User Sessions and Workflow Management: Implementing user authentication to allow teammates to save, name, share, and reuse their custom workflows, eliminating the need to rebuild them for every session.

The Bigger Picture: Bridging the Gap

Building this prototype underscored a critical insight. We are living in a time with access to incredibly powerful technology like Nano Banana. The technology is here, and it works.

However, the existence of a powerful model is not enough. The key challenge now is to bridge the gap between these technological capabilities and the real-world, day-to-day needs of our colleagues on the field.

As this project demonstrates, we don’t need massive engineering teams or long development cycles to deliver significant value. By identifying specific pain points and leveraging modern tools like Gradio and Cloud Run, we can rapidly prototype solutions that make a difference.

This is a phenomenal opportunity for builders and entrepreneurs within any organization. The tools are ready. It’s time to build!

Github Repository : https://github.com/fmind/combo-banana

Source: Combo-Banana