<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Médéric Hurier (Fmind)</title>
    <description>The latest articles on Forem by Médéric Hurier (Fmind) (@fmind).</description>
    <link>https://forem.com/fmind</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3755461%2F1606daec-23ba-429e-98db-baa19ef3634d.png</url>
      <title>Forem: Médéric Hurier (Fmind)</title>
      <link>https://forem.com/fmind</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/fmind"/>
    <language>en</language>
    <item>
      <title>How I Revamped My Portfolio Website in 5 Nights Using AI Agents</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Thu, 19 Feb 2026 20:43:26 +0000</pubDate>
      <link>https://forem.com/fmind/how-i-revamped-my-portfolio-website-in-5-nights-using-ai-agents-57d4</link>
      <guid>https://forem.com/fmind/how-i-revamped-my-portfolio-website-in-5-nights-using-ai-agents-57d4</guid>
      <description>&lt;p&gt;I see many friends and acquaintances generating amazing applications in mere weeks. We are in the midst of a craze of innovation, an era where inspired people can bring their ideas to life without being strictly bounded by the usual constraints of tool mastery.&lt;/p&gt;

&lt;p&gt;This means there is no time to slack off. The go-getters will not wait for you to catch up. One representative aspect of this fast-moving landscape is your digital presence — your portfolio.&lt;/p&gt;

&lt;p&gt;As a freelance AI/ML Architect, I realized I needed a space that truly reflected my expertise. In this article, I present the revamp of my own portfolio: &lt;a href="https://fmind.dev" rel="noopener noreferrer"&gt;fmind.dev&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0r1suucdet9pejwokqst.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0r1suucdet9pejwokqst.png" width="800" height="376"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;New Homepage of Fmind.dev&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Old and Dusty Website
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://sites.google.com/fmind.dev/fmind-dev/home" rel="noopener noreferrer"&gt;My previous website&lt;/a&gt; was built with Google Sites. It was cheap, highly functional, and effortless to maintain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m4rmi6mkeptahmcm0iv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m4rmi6mkeptahmcm0iv.png" width="800" height="376"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Previous version of Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But functional is no longer enough in 2026, especially for a freelancer who constantly needs to be on the bleeding edge of technology. Google Sites afforded me very little control over styling, strict limitations on SEO, and — crucially — zero ability to easily evolve it using modern AI coding assistants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Something had to be done&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Preparing the Field: Setting Up the Agent
&lt;/h3&gt;

&lt;p&gt;The secret to a successful AI-assisted project isn’t just jumping in and prompting; it’s about preparing the groundwork.&lt;/p&gt;

&lt;p&gt;I started by defining several &lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;agent skills&lt;/a&gt; tailored to prepare my specific text stack. For this project, I created dedicated skills covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend Development&lt;/li&gt;
&lt;li&gt;Frontend Development&lt;/li&gt;
&lt;li&gt;GCP Deployment&lt;/li&gt;
&lt;li&gt;GCP Observability&lt;/li&gt;
&lt;li&gt;Mobile Optimization&lt;/li&gt;
&lt;li&gt;Project Tooling&lt;/li&gt;
&lt;li&gt;SEO Optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, instead of continuously reminding the agent who I am and what I want, I explicitly created two context files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PROFILE.md&lt;/strong&gt; : Outlining my professional identity, experience, and links.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Profile - Médéric Hurier (Fmind)

## Headline

Freelancer • AI/ML Architect &amp;amp; Engineer • AI Agents &amp;amp; MLOps • GCP Professional Cloud Architect • PhD in AI &amp;amp; Computer Security

...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;DESIGN.md&lt;/strong&gt; : Defining my brand identity (e.g., “Space &amp;amp; Tech” aesthetic).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Website Design

## Brand Identity

A professional, advanced, and modern digital presence for an AI/ML Architect.

Blending clean and modern style with a "Space &amp;amp; Tech" aesthetic to reflect deep expertise in Artificial Intelligence and MLOps.

...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By establishing these foundational documents, I set the ground rules. The AI agent had immediate access to my personality and brand identity, saving me from having to explain the context over and over again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Website: Steering the AI
&lt;/h3&gt;

&lt;p&gt;Rather than “vibe coding” everything and blindly compiling the results, I made a strict rule: I would review every single file generated by the agent. As an engineer, I refuse to be responsible for a codebase I haven’t read or understood.&lt;/p&gt;

&lt;p&gt;Despite the agent’s impressive capabilities, I noted several weaknesses that you must manage when using AI agents to generating code:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Agents don’t care about your brand. They just want to get the task done.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;— A harsh truth of AI coding&lt;/p&gt;

&lt;p&gt;It’s entirely up to you to rigorously maintain your identity and ensure the output matches your needs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Code quickly degenerates. Without supervision, AI can generate piles and piles of code lacking underlying logic.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Agents are fantastic at refactoring, but &lt;em&gt;you&lt;/em&gt; are the architect who must point them in the right direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing the right tech stack is key.&lt;/strong&gt; Call me old school, but I demand mastery over what I produce. I decided to use &lt;strong&gt;Python&lt;/strong&gt; as the foundational layer for my programming logic. I am confident in my ability to maintain, debug, and evolve a Python-based architecture over the long term.&lt;/p&gt;

&lt;p&gt;Once you actively tackle these limitations, the results are breathtaking. While I generally don’t enjoy writing repetitive boilerplate or templates, the AI agent excels at it and never gets tired. You can direct its focus entirely to the frontend, backend, deployment, SEO, or mobile optimization. Watching the pieces fall into place gives you an incredible dopamine rush.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Result
&lt;/h3&gt;

&lt;p&gt;I was able to finalize the website from scratch in just &lt;strong&gt;5 nights&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This rapid timeline included everything: generating the base components, extensive optimizations, refactoring, and setting up proper deployment pipelines (including analytics and observability). While I could have let the agent run entirely autonomously, it is &lt;em&gt;my&lt;/em&gt; website; I wanted to be an active part of the process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0z6kogyrv13cpsjfsizk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0z6kogyrv13cpsjfsizk.png" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Result of PageSpeed Insights (LightHouse)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’m thrilled to report that the new site achieved a &lt;strong&gt;100% Lighthouse score&lt;/strong&gt; across every single category. The site is fast, modern, and beautifully aligned with my brand. I would have never been able to produce this level of polish by myself in such a short timeframe, and I’m incredibly proud of the result.&lt;/p&gt;

&lt;p&gt;For context on the effort, here is a quick look at the codebase generated and reviewed during those 5 nights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;github.com/AlDanial/cloc v 1.98 T=0.10 s (915.2 files/s, 30379.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
HTML 57 33 40 1257
Markdown 11 277 0 580
Python 12 138 79 368
TOML 1 8 1 84
YAML 3 0 3 81
CSS 2 9 0 41
Text 4 7 0 33
JSON 1 0 0 31
Dockerfile 1 1 2 27
-------------------------------------------------------------------------------
SUM: 92 473 125 2502
-------------------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusions
&lt;/h3&gt;

&lt;p&gt;AI Coding is an absolute game-changer. I highly encourage any developer or freelancer to revamp their own website as soon as possible. It is the perfect, tightly-scoped exercise to discover the true, practical capacity of these tools for yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with clear context files&lt;/strong&gt; to ground your agent’s understanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not blindly trust generated code;&lt;/strong&gt; review everything to maintain architectural control.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid “shiny object syndrome.”&lt;/strong&gt; Relying entirely on AI without architectural vision can lead to generic, unmaintainable results. Find the right balance between automation and engineering rigor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the final result at &lt;a href="https://fmind.dev" rel="noopener noreferrer"&gt;fmind.dev&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5p7sgek0f7qlcck0zfp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5p7sgek0f7qlcck0zfp.png" width="800" height="402"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;About me on Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you tried ‘vibe coding’ on your own projects yet, let me know your experience in the comments!&lt;/p&gt;

</description>
      <category>artificialintelligen</category>
      <category>softwareengineering</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>Chaigent: An affordable alternative to Gemini Enterprise on Google Cloud</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Fri, 06 Feb 2026 20:14:02 +0000</pubDate>
      <link>https://forem.com/fmind/chaigent-an-affordable-alternative-to-gemini-enterprise-on-google-cloud-32ah</link>
      <guid>https://forem.com/fmind/chaigent-an-affordable-alternative-to-gemini-enterprise-on-google-cloud-32ah</guid>
      <description>&lt;p&gt;The era of simple chatbots is over. Companies are now racing to build &lt;a href="https://fmind.medium.com/architecting-the-ai-agent-platform-a-definitive-guide-405750a3de44" rel="noopener noreferrer"&gt;&lt;strong&gt;AI Agent platforms&lt;/strong&gt;&lt;/a&gt; — systems that don’t just talk, but &lt;em&gt;act&lt;/em&gt;. Whether it’s a support bot resolving Jira tickets or a data analyst agent querying BigQuery, these new digital teammates need a platform that offers more than just text generation: they require reasoning, security, and enterprise-grade observability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/gemini/enterprise" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini Enterprise&lt;/strong&gt;&lt;/a&gt; provides a great path to achieving this on &lt;a href="https://cloud.google.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud&lt;/strong&gt;&lt;/a&gt;. It offers a comprehensive set of features including agent exposition, governance, integrated knowledge search, and a visual agent builder, connecting with backends like &lt;a href="https://cloud.google.com/products/agent-engine" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI Agent Engine&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://docs.cloud.google.com/dialogflow/cx/docs" rel="noopener noreferrer"&gt;&lt;strong&gt;Conversational Agent&lt;/strong&gt;&lt;/a&gt;, or &lt;a href="https://a2aprotocol.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;A2A&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, for some organizations or specific use cases, the cost can be a friction point. The catalog price sits at &lt;strong&gt;~$7/user/month&lt;/strong&gt; for agent users and &lt;strong&gt;~$35/user/month&lt;/strong&gt; for visual agent builders. While this pricing is competitive for knowledge workers who gain significant productivity, it can be prohibitive for large audiences with lower usage frequency, such as field workers or occasional users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enter Chaigent:&lt;/strong&gt; &lt;a href="https://github.com/fmind/chaigent" rel="noopener noreferrer"&gt;&lt;strong&gt;https://github.com/fmind/chaigent&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5ybjuxkqco4k8yezmcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5ybjuxkqco4k8yezmcp.png" width="800" height="446"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Chaigent is an affordable alternative to Gemini Enterprise (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, I present “Chaigent” (&lt;a href="https://chainlit.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Chainlit&lt;/strong&gt;&lt;/a&gt; + Agent), a cost-effective, DIY alternative to Gemini Enterprise on Google Cloud. It leverages the same powerful underlying reasoning engine but replaces the managed frontend with an open-source framework, giving you control over features and costs.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;

&lt;p&gt;Chaigent enables you to build a private, secure AI agent platform by combining serverless infrastructure with open-source tooling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugnuds2ohznmtqzet2u4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugnuds2ohznmtqzet2u4.png" width="800" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture of Chaigent (Source: Fmind.dev)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The architecture consists of three main layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Frontend (&lt;/strong&gt;&lt;a href="https://chainlit.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Chainlit&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;on&lt;/strong&gt; &lt;a href="https://cloud.google.com/run" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud Run&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;: A Python-based UI that handles user sessions, chat history, and authentication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend (&lt;/strong&gt;&lt;a href="https://cloud.google.com/products/agent-engine" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI Agent Engine&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;: The “brain” of the operation, capable of reasoning and tool use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence &amp;amp; Auth&lt;/strong&gt; : &lt;a href="https://cloud.google.com/sql" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud SQL&lt;/strong&gt;&lt;/a&gt; for storing chat history and feedback, and &lt;a href="https://oauth.net/2/" rel="noopener noreferrer"&gt;&lt;strong&gt;OAuth&lt;/strong&gt;&lt;/a&gt; (Google, GitHub, etc.) for secure identity management.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach allows you to pay for &lt;strong&gt;consumption only&lt;/strong&gt; (Cloud Run CPU + Vertex AI tokens), significantly reducing costs for intermittent usage patterns compared to a flat per-seat license.&lt;/p&gt;
&lt;h3&gt;
  
  
  The “Do It Yourself” Trade-off
&lt;/h3&gt;

&lt;p&gt;Gemini Enterprise provides a managed, “batteries-included” platform with built-in governance and visual tools. Chaigent, in contrast, offers a code-first, developer-centric approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you gain:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency&lt;/strong&gt; : No monthly per-seat licensing fees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Customization&lt;/strong&gt; : You own the code. Want to add a custom feedback mechanism or a specific UI widget? You can.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform Independence&lt;/strong&gt; : Using &lt;a href="https://chainlit.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Chainlit&lt;/strong&gt;&lt;/a&gt; (frontend) and &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;&lt;strong&gt;Google ADK&lt;/strong&gt;&lt;/a&gt; (backend) logic keeps you flexible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What you lose (The “Subtext”):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Visual Builder&lt;/strong&gt; : You define agents in code, not a drag-and-drop UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual Governance&lt;/strong&gt; : You must implement your own permission logic per agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ops Overhead&lt;/strong&gt; : You are responsible for deploying, securing, and updating the application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterpise Features&lt;/strong&gt; : Advanced features like Model Armor (&lt;a href="https://cloud.google.com/security/products/model-armor" rel="noopener noreferrer"&gt;&lt;strong&gt;Prompt Security&lt;/strong&gt;&lt;/a&gt;) and integrated Knowledge Search (&lt;a href="https://cloud.google.com/vertex-ai/docs/retrieval-augmented-generation" rel="noopener noreferrer"&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/a&gt;) require manual implementation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Implementation Highlights
&lt;/h3&gt;

&lt;p&gt;Chaigent is surprisingly simple to set up. Here is a glimpse of the code.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Defining the Agent
&lt;/h3&gt;

&lt;p&gt;The agent is defined declaratively using the &lt;a href="https://github.com/google/adk" rel="noopener noreferrer"&gt;&lt;strong&gt;Google ADK&lt;/strong&gt;&lt;/a&gt;. It’s just a Python object specifying the model and tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# chaigent/agent.py
root_agent = agent(
    name="chaigent",
    model="gemini-2.5-flash",
    description="answer questions with google search.",
    instruction="you are an expert researcher. you always stick to the facts.",
    tools=[google_search],
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The Bridge (Chainlit Adapter)
&lt;/h3&gt;

&lt;p&gt;The app.py acts as the bridge. It connects the user’s chat session to the Vertex AI Agent Engine, handling the streaming response seamlessly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# app.py

@cl.on_message
async def on_message(message: cl.Message):
    # Initialize response message
    answer = cl.Message(content="")
    await answer.send()

    # Retrieve session
    session = cl.user_session.get("session")
    user_id, session_id = session["userId"], session["id"]

    # Stream the query to Vertex AI
    response_stream = engine.async_stream_query(
        user_id=user_id, message=message.content, session_id=session_id
    )

    # Stream back the tokens
    async for chunk in response_stream:
        for part in chunk.get("content", {}).get("parts", []):
            text = part.get("text", "")
            if text:
                await answer.stream_token(text)
                await answer.update()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  User Experience
&lt;/h3&gt;

&lt;p&gt;Despite being a “DIY” solution, the user experience is premium. Chainlit provides features that users expect from modern chat apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rich Chat Interface&lt;/strong&gt; : Supports markdown, code highlighting, and streaming responses out of the box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t81nqszlcx0eo1ocvjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t81nqszlcx0eo1ocvjg.png" width="800" height="435"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Chat interface of Chaigent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication &amp;amp; Persistence&lt;/strong&gt; : Secure login screens and persisted chat history allow users to resume conversations across devices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi3tc6m8zpoqc8415wuk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyi3tc6m8zpoqc8415wuk.png" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Login screen of Chaigent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Layer&lt;/strong&gt; : All interactions are stored in your own SQL database, giving you full ownership of the data for analytics or fine-tuning later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9istczsweoqlgl289qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9istczsweoqlgl289qd.png" width="800" height="438"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Home screen of Chaigent&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Chaigent is an excellent solution when &lt;strong&gt;cost efficiency&lt;/strong&gt; is the primary driver, particularly for large audiences with low individual usage.&lt;/p&gt;

&lt;p&gt;The decision comes down to ROI. At ~$7/month/user for Gemini Enterprise, you need to save each user at least one hour of work per month to break even. For knowledge workers, this is a no-brainer. But for field workers or casual users, a consumption-based “Pay-as-you-go” model like Chaigent might be the smarter financial move.&lt;/p&gt;

&lt;p&gt;If you are ready to trade some convenience for control and cost savings, go build your own agents!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqtud4hao3w1r94gzykw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqtud4hao3w1r94gzykw.png" width="800" height="446"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googlecloudplatform</category>
      <category>generativeaitools</category>
      <category>datascience</category>
      <category>agents</category>
    </item>
    <item>
      <title>mAIdAI: Building a Personal Assistant with Google Cloud and Vertex AI</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Thu, 05 Feb 2026 06:05:23 +0000</pubDate>
      <link>https://forem.com/fmind/maidai-building-a-personal-assistant-with-google-cloud-and-vertex-ai-aof</link>
      <guid>https://forem.com/fmind/maidai-building-a-personal-assistant-with-google-cloud-and-vertex-ai-aof</guid>
      <description>&lt;p&gt;As an AI Architect, I spend my days designing AI systems and agents for others. I optimize workflows, fine-tune context windows, and architect serverless solutions to solve complex business problems.&lt;/p&gt;

&lt;p&gt;But recently, I caught myself in a classic “cobbler’s children” scenario. While helpful bots supported my teams, I navigated my own workflow manually — answering the same repetitive questions, digging for the same documentation links, and context-switching constantly.&lt;/p&gt;

&lt;p&gt;I realized I needed something different. Not another generic team bot, but a &lt;strong&gt;Personal AI Assistant&lt;/strong&gt;  — one that knows &lt;em&gt;my&lt;/em&gt; specific context, &lt;em&gt;my&lt;/em&gt; preferred shortcuts, and &lt;em&gt;my&lt;/em&gt; tone.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;mAIdAI&lt;/strong&gt; (My AI Aid) &lt;strong&gt;:&lt;/strong&gt; &lt;a href="https://github.com/fmind/maidai" rel="noopener noreferrer"&gt;https://github.com/fmind/maidai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AfokYASaQyFqSm0i4n-tPaQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AfokYASaQyFqSm0i4n-tPaQ.png" width="800" height="446"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;mAIdAI Avatar (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through how I architected this personal agent using &lt;strong&gt;Google Chat&lt;/strong&gt; , &lt;strong&gt;Cloud Run&lt;/strong&gt; , and &lt;strong&gt;Vertex AI&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem: The High Cost of “Quick” Tasks
&lt;/h3&gt;

&lt;p&gt;We often underestimate the micro-friction in our daily work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Where is the design doc for Project X?”&lt;/li&gt;
&lt;li&gt;“What’s the syntax for that specific gcloud command again?”&lt;/li&gt;
&lt;li&gt;“Can you review this snippet?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Standard team bots are great, but they are generic. They lack the &lt;em&gt;specific&lt;/em&gt; context of your personal role and responsibilities. I wanted an agent that acts as a “Second Brain” — grounded in my personal knowledge and capable of executing my specific workflows.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution: mAIdAI Pattern
&lt;/h3&gt;

&lt;p&gt;mAIdAI is designed around three core interaction types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context-Aware Chat&lt;/strong&gt; : A conversational flow grounded in a personal context.md file effective “system instructions”.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick Commands&lt;/strong&gt; : Instant helpers that return static values (like commonly used links or snippets) without invoking the LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slash Commands&lt;/strong&gt; : specialized triggers that wrap user input in a predefined prompt template (e.g., /fix to debug code).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AzZIz8pnehINoHvixMcWeXA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AzZIz8pnehINoHvixMcWeXA.png" width="800" height="327"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Demo of mAIdAI on Google Chat (Generic Version)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AndK1QlYRr_Wlr05q2zVUAw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AndK1QlYRr_Wlr05q2zVUAw.png" width="800" height="414"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;About mAIdAI&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;The system follows a lightweight, serverless event-driven architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A95OQreqPYlfHWmN_I2qGBw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A95OQreqPYlfHWmN_I2qGBw.png" width="800" height="259"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture Diagram of mAIdAI (Source: Fmind.dev)&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt; : usage of the &lt;strong&gt;Google Chat&lt;/strong&gt; app interface. No custom UI to build or maintain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport&lt;/strong&gt; : Chat events are delivered via HTTP webhooks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt; : A &lt;strong&gt;Cloud Run&lt;/strong&gt; service hosting a FastAPI application processes the events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligence&lt;/strong&gt; : The backend connects to &lt;strong&gt;Vertex AI&lt;/strong&gt; (Gemini models) for reasoning, grounded by the personal context file.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Deep Dive: The Code
&lt;/h3&gt;

&lt;p&gt;The implementation is surprisingly minimal, thanks to the &lt;strong&gt;Google GenAI SDK&lt;/strong&gt; and &lt;strong&gt;FastAPI&lt;/strong&gt;. The entire core logic resides in a single main.py file.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. The Setup
&lt;/h3&gt;

&lt;p&gt;We initialize the GenAI client using standard environment variables. This keeps the code portable and secure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# main.py
client = genai.Client(
  project=os.environ["GOOGLE_CLOUD_PROJECT"],
  location=os.environ["GOOGLE_CLOUD_LOCATION"],
  vertexai=True,
)
# Loading the Second Brain
MODEL_CONTEXT = (ROOT_FOLDER / "context.md").read_text()
config = types.GenerateContentConfig(
  system_instruction=MODEL_CONTEXT,
  max_output_tokens=5000,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By reading context.md at startup and injecting it as the system_instruction, we ensure every interaction is grounded in my specific reality.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Handling Interaction Types
&lt;/h3&gt;

&lt;p&gt;The core router handles the distinction between simple commands and AI interactions. This is crucial for latency and cost — not every interaction needs a round-trip to an LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@app.post("/")
async def index(request: Request) -&amp;gt; dict:
    event = await request.json()
    # ... extraction logic ...

    if command_id := app_command_metadata.get("appCommandId"):
        # Handle Slash and Quick Commands
        if command_type == "QUICK_COMMAND":
            return respond(command_text)

        if command_type == "SLASH_COMMAND":
            # Contextualize the prompt
            prompt = f"{command_text}. USER INPUT: {user_input}"
            return respond(await chat(prompt))

    # Fallback to standard chat
    return respond(await chat(user_input))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern allows me to have a /links command that returns immediately (0 latency, 0 cost), while a /rewrite command leverages Gemin 2.0 Flash for creative work.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Asynchrony by Default
&lt;/h3&gt;

&lt;p&gt;Using async def and client.aio.models.generate_content ensures the Cloud Run container can handle multiple concurrent requests efficiently, even with a single instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Strategy
&lt;/h3&gt;

&lt;p&gt;Simplicity was the primary constraint. I didn’t want to manage infrastructure for a personal tool.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime&lt;/strong&gt; : Cloud Run (fully managed, scales to zero, low-cost serving).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration&lt;/strong&gt; : Environment variables for model selection (gemini-3-flash) and project details.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; : IAM-based authentication ensures only verified chat events reach the service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Build This Locally?
&lt;/h3&gt;

&lt;p&gt;You might ask, “Why not use a standard consumer AI chat?”&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; : Data stays within my Google Cloud project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; : I control the system prompt (context.md) explicitly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Integration&lt;/strong&gt; : It lives where I work — in Google Chat — not a separate browser tab.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;We often accept friction because “it’s just how things are.” But as engineers, we have the tools to change that. mAIdAI is a proof of concept that a highly personalized, context-aware agent doesn’t require a massive engineering team. It just requires a few hundred lines of Python and the right cloud primitives.&lt;/p&gt;

&lt;p&gt;If you find yourself copying the same text or answering the same questions repeatedly, maybe it’s time to build your own assistant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A7Uo4-6ug2-tX4An5lEz4qw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A7Uo4-6ug2-tX4An5lEz4qw.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

</description>
      <category>artificialintelligen</category>
      <category>agents</category>
      <category>automation</category>
      <category>generativeaitools</category>
    </item>
    <item>
      <title>MLOps Coding Skills: Bridging the Gap Between Specs and Agents</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Wed, 28 Jan 2026 20:50:20 +0000</pubDate>
      <link>https://forem.com/fmind/mlops-coding-skills-bridging-the-gap-between-specs-and-agents-3mn1</link>
      <guid>https://forem.com/fmind/mlops-coding-skills-bridging-the-gap-between-specs-and-agents-3mn1</guid>
      <description>&lt;p&gt;We are entering the golden age of AI Coding. Every day, I see colleagues, both technical and non-technical, marveling at how agents are rewriting the rules of software construction. The promise is intoxicating: describe what you want, and let the machine handle the rest.&lt;/p&gt;

&lt;p&gt;However, when I see my colleagues try to apply these agents to strict engineering standards, they hit a wall. On one side, you have rigorous specification tools like &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;spec-kit&lt;/a&gt; or &lt;a href="https://github.com/gemini-cli-extensions/conductor" rel="noopener noreferrer"&gt;conductor&lt;/a&gt;. They are deterministic and thorough, but setting them up feels like writing a legal contract. On the other side, you have generic tools like the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;. They act as incredible “hands” for the AI — reading databases, calling APIs — but they lack the &lt;em&gt;brain&lt;/em&gt; for your specific context.&lt;/p&gt;

&lt;p&gt;They don’t know that your team enforces &lt;a href="https://github.com/astral-sh/uv" rel="noopener noreferrer"&gt;uv&lt;/a&gt; over &lt;a href="https://python-poetry.org/" rel="noopener noreferrer"&gt;poetry&lt;/a&gt;. They don’t know you prefer &lt;a href="https://github.com/casey/just" rel="noopener noreferrer"&gt;just&lt;/a&gt; files for automation. They don’t know your specific flavor of “clean code.”&lt;/p&gt;

&lt;p&gt;Then I discovered &lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Skills&lt;/strong&gt;&lt;/a&gt;, and everything clicked.&lt;/p&gt;

&lt;p&gt;I was immediately hooked. They offer the specific trade-off I had been looking for: &lt;strong&gt;lightweight enough to be flexible, yet opinionated enough to be useful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AtAqN0WowISthT-ZFHGj4uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AtAqN0WowISthT-ZFHGj4uw.png" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, I want to share how I used Agent Skills to turn the theoretical “MLOps Coding Course” into a practical, actionable library: the &lt;a href="https://github.com/MLOps-Courses/mlops-coding-skills" rel="noopener noreferrer"&gt;&lt;strong&gt;MLOps Coding Skills&lt;/strong&gt;&lt;/a&gt; project.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Challenge: Making References Actionable
&lt;/h3&gt;

&lt;p&gt;For the past few months, I’ve been deep in the trenches writing the &lt;a href="https://github.com/MLOps-Courses/mlops-coding-course" rel="noopener noreferrer"&gt;&lt;strong&gt;MLOps Coding Course&lt;/strong&gt;&lt;/a&gt;. It is a comprehensive curriculum teaching production-grade MLOps, from robust project initialization to advanced observability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2A5Q2LkQqqDJaO_l3f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2A5Q2LkQqqDJaO_l3f.png" width="800" height="825"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But as I wrote the documentation, I felt a friction point. Learning the standards is one thing; remembering to apply them in the heat of coding is another.&lt;/p&gt;

&lt;p&gt;I didn’t just want another wiki page. I wanted to make these best practices &lt;strong&gt;actionable&lt;/strong&gt; for valid AI Agents. I wanted to move from “reading the docs” to “installing the capability.”&lt;/p&gt;
&lt;h3&gt;
  
  
  The Logic: How to “Skillify” Knowledge
&lt;/h3&gt;

&lt;p&gt;The beauty of an Agent Skill lies in its simplicity. It is essentially a markdown file (SKILL.md) that functions as a context injection module. It gives the agent “muscle memory” for a specific topic.&lt;/p&gt;

&lt;p&gt;My methodology for building the &lt;strong&gt;MLOps Coding Skills&lt;/strong&gt; repo was straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Isolate a Chapter&lt;/strong&gt; : Take a specific section of the course (e.g., &lt;em&gt;Automation&lt;/em&gt; or &lt;em&gt;Observability&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract Patterns&lt;/strong&gt; : Use an LLM to distill the generic engineering standards from the educational content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardize&lt;/strong&gt; : Format it into a SKILL.md that an agent can ingest.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  A Concrete Example: Automating Ops
&lt;/h3&gt;

&lt;p&gt;Let’s look at the &lt;a href="https://github.com/MLOps-Courses/mlops-coding-skills/tree/main/mlops-automation" rel="noopener noreferrer"&gt;&lt;strong&gt;mlops-automation&lt;/strong&gt;&lt;/a&gt; skill.&lt;/p&gt;

&lt;p&gt;In our course, we have strong opinions: we use just for command running and &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;docker&lt;/a&gt; for containerization, with very specific layer caching strategies.&lt;/p&gt;

&lt;p&gt;Here is what the skill looks like “on the wire”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# MLOps Automation

## Goal

To elevate the codebase to production standards by adding Task Automation (just), Containerization ([docker](https://www.docker.com/)), CI/CD ([github-actions](https://github.com/features/actions)), and Experiment Tracking ([mlflow](https://mlflow.org/)).

## Instructions

### 1. Task Automation

Replace manual commands with a `justfile`.
1. **Tool** : `just` (modern alternative to Make).
2. **Organization** : Split tasks into `tasks/*.just` modules.
3. **Core Tasks** :
- `check`: Run all linters and tests.
- `package`: Build wheels.

### 2. Containerization

1. **Tool** : `docker`.
2. **Base Image** : Use `ghcr.io/astral-sh/uv:python3.1X-bookworm-slim` for minimal size.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When I load this skill, my agent stops guessing. It doesn’t offer me a Makefile. It doesn’t suggest a bloated Ubuntu image. It acts like a senior engineer who has been on the team for years.&lt;/p&gt;

&lt;h3&gt;
  
  
  The “Senior Engineer” Injection
&lt;/h3&gt;

&lt;p&gt;This is the killer value proposition.&lt;/p&gt;

&lt;p&gt;Most frustrations with AI coding come from a &lt;strong&gt;lack of context&lt;/strong&gt;. We blame the model for being “dumb,” but usually, we just haven’t told it the rules of the house.&lt;/p&gt;

&lt;p&gt;By using Agent Skills, you are effectively &lt;strong&gt;injecting a Senior Engineer into your chat context&lt;/strong&gt;. You are giving the agent a “cheat sheet” that forces it to align with your organization’s reality.&lt;/p&gt;

&lt;p&gt;I now use these skills for every new project I touch. I don’t spend an hour setting up boilerplate. I load or create a skill, and within minutes, I had a structure that matched my most rigorous standards.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Friction Points
&lt;/h3&gt;

&lt;p&gt;Of course, no solution is perfect. There are still rough edges in this workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local-First Friction&lt;/strong&gt; : Currently, skills often sit in a local .agent/skills folder. It works, but copying them around feels archaic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Context Stack&lt;/strong&gt; : We are seeing a fragmentation of context. We have MCP servers for tools, AGENTS.md for persona, and Skills for tasks. Managing this “Context Stack” is becoming a new engineering discipline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration gaps&lt;/strong&gt; : I love how the &lt;strong&gt;Gemini CLI&lt;/strong&gt; handles this via extensions, but I’m eager to see this standardized across VS Code Copilot, Cursor, and other IDEs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Despite the minor friction, Agent Skills are excellent “Low Hanging Fruit” for any engineering team.&lt;/p&gt;

&lt;p&gt;The productivity gain is massive. For a few minutes of setup — writing a markdown file — you save hours of correcting boilerplate code and enforcing standards down the line. It bridges the gap between the &lt;strong&gt;rigidity of a spec&lt;/strong&gt; and the &lt;strong&gt;chaos of a raw LLM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you are tired of fighting your AI to follow your style, stop arguing with it. Give it a Skill.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check out the full&lt;/em&gt; &lt;a href="https://github.com/MLOps-Courses/mlops-coding-skills" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;MLOps Coding Skills repository&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt; &lt;em&gt;to see the library in action.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AsSNsSykKaOOfBIom-2bmMw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AsSNsSykKaOOfBIom-2bmMw.png" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>coding</category>
      <category>generativeaitools</category>
      <category>agents</category>
    </item>
    <item>
      <title>Building with A2UI: Extending the Expressiveness of AI Agent Interfaces</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Wed, 28 Jan 2026 05:48:33 +0000</pubDate>
      <link>https://forem.com/fmind/building-with-a2ui-extending-the-expressiveness-of-ai-agent-interfaces-36h3</link>
      <guid>https://forem.com/fmind/building-with-a2ui-extending-the-expressiveness-of-ai-agent-interfaces-36h3</guid>
      <description>&lt;p&gt;In 2026, AI agents have become incredibly smart, yet they are often limited to simple chatbot interfaces. We have engines capable of reasoning, planning, and coding, but we force them to communicate results through text and basic markdown.&lt;/p&gt;

&lt;p&gt;To unlock the full potential of agents, we need a better language for them to express themselves. We need agents that can project rich, dynamic, and interactive user interfaces that adapt to the user’s intent.&lt;/p&gt;

&lt;p&gt;This is the promise of &lt;a href="https://a2ui.org/" rel="noopener noreferrer"&gt;&lt;strong&gt;A2UI (Agent-to-User Interface)&lt;/strong&gt;&lt;/a&gt;: a protocol that allows agents to “speak” UI natively.&lt;/p&gt;

&lt;p&gt;In my &lt;a href="https://fmind.medium.com/finding-the-holy-grail-of-ai-agent-uis-from-ai-orchestrated-development-to-a2ui" rel="noopener noreferrer"&gt;previous article&lt;/a&gt;, I explored the landscape of AI UI solutions and explained why A2UI stands out. Now, I wanted to put it to the test. I built &lt;a href="https://github.com/fmind/featest" rel="noopener noreferrer"&gt;Featest&lt;/a&gt;, a feature request application designed to be “AI-First.” Here is the story of how it was built, the strengths of the protocols I used, and the architectural patterns that emerged.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AzIFY81xbX8ot9LukFPvVcQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AzIFY81xbX8ot9LukFPvVcQ.png" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Project: Featest
&lt;/h3&gt;

&lt;p&gt;The origin of this project was a simple request from my Product Manager: &lt;em&gt;“We need a way for users to vote on features.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I could have built a standard CRUD app. But I saw this as an opportunity. A feature request board is dynamic. Users have vague intents: &lt;em&gt;“I want something like dark mode but for audio.”&lt;/em&gt; They want to merge duplicates. They want to see trends, and admins want to automatically tag requests.&lt;/p&gt;

&lt;p&gt;This was the perfect scenario to consider an &lt;strong&gt;AI-First experience&lt;/strong&gt;. I didn’t want a static form; I wanted an agent that users could talk to, which enhances the interactions, and uses the right UI tools when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github Repository&lt;/strong&gt; : &lt;a href="https://github.com/fmind/featest" rel="noopener noreferrer"&gt;Featest&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AGCAe3IiU7n5RhCafd2JBIg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AGCAe3IiU7n5RhCafd2JBIg.png" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Suggest new features with AI (Source: Fmind.dev)&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Power Couple: A2UI and A2A
&lt;/h3&gt;

&lt;p&gt;Before diving into the code, it’s critical to understand the two pillars of this architecture.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. A2UI: Agent-to-User Interface (The Content)
&lt;/h3&gt;

&lt;p&gt;A2UI is a &lt;strong&gt;declarative protocol&lt;/strong&gt;. Instead of an agent writing code (which is risky and error-prone), it streams a structured JSON description of a UI.&lt;/p&gt;

&lt;p&gt;Here is what it looks like on the wire. The agent sends SurfaceUpdate events to render components like a card with a button:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "surfaceUpdate": {
    "surfaceId": "main-surface",
    "components": [
      {
        "id": "welcome-card",
        "component": {
          "Card": {
            "child": "welcome-text"
          }
        }
      },
      {
        "id": "welcome-text",
        "component": {
          "Text": {
            "text": { "literalString": "Welcome to Featest" },
            "usageHint": "h1"
          }
        }
      }
    ]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. A2A: Agent-to-Agent Communication (The Transport)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/a2aproject/A2A" rel="noopener noreferrer"&gt;A2A&lt;/a&gt; is the &lt;strong&gt;transport protocol&lt;/strong&gt;. It standardizes how agents talk to each other and to clients over HTTP. It handles the handshake, the task lifecycle, and the message passing.&lt;/p&gt;

&lt;p&gt;In Featest, the client wraps the user’s intent in an A2A message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /api/agents/feature_request_agent/tasks
Content-Type: application/json

{
  "task_id": "12345",
  "input": {
    "text": "I want to vote for dark mode"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together, they create a universal language. A2A carries the envelope, and A2UI ensures the letter inside contains rich, interactive content, not just text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;I designed the system to be modular, using the &lt;a href="https://google.github.io/adk-docs" rel="noopener noreferrer"&gt;Google Agent Development Kit (ADK)&lt;/a&gt; and the A2A protocol effectively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ANCJHxfYnDp8skDQ8iOzvBA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ANCJHxfYnDp8skDQ8iOzvBA.png" width="800" height="100"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture Diagram of the Featest App (Source: Fmind.dev)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The flow is bidirectional and relies on robust open-source packages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User&lt;/strong&gt; interacts with the &lt;strong&gt;Lit Client&lt;/strong&gt; , which uses the official &lt;a href="https://github.com/a2ui/a2ui/tree/main/packages/lit" rel="noopener noreferrer"&gt;@a2ui/lit&lt;/a&gt; renderer. It acts as a state machine, processing SurfaceUpdate events to patch the DOM efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client&lt;/strong&gt; sends intent via &lt;strong&gt;A2A&lt;/strong&gt; to the &lt;strong&gt;Backend&lt;/strong&gt;. The A2UIClient wraps the user’s input (text or events) in a standard JSON-RPC envelope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend Agent&lt;/strong&gt; processes logic and streams back &lt;strong&gt;A2UI&lt;/strong&gt; JSON instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client&lt;/strong&gt; renders the UI components dynamically.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The power of this system comes from the &lt;strong&gt;Component Schema&lt;/strong&gt;. Featest supports a rich set of native components defined in schemas.py, ensuring the agent has high-level building blocks rather than raw HTML:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layout&lt;/strong&gt; : Row, Column, List, Card, Tabs, Divider, Modal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt; : Button, CheckBox, TextField, DateTimeInput, MultipleChoice, Slider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media&lt;/strong&gt; : Text, Image, Icon, Video, AudioPlayer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The AVC Pattern: Agent-View-Controller
&lt;/h3&gt;

&lt;p&gt;The most significant discovery I made while building Featest was an architectural one.&lt;/p&gt;

&lt;p&gt;When you start building complex agent apps, you quickly realize that a single agent doing everything (reasoning, database access, UI formatting) is a mess. It’s hard to test and hard to control.&lt;/p&gt;

&lt;p&gt;I adopted what I call the &lt;strong&gt;Agent-View-Controller (AVC) Pattern&lt;/strong&gt; — an evolution of &lt;a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller" rel="noopener noreferrer"&gt;Model-View-Controller (MVC)&lt;/a&gt; for the agent era.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. The Controller Agent (The Brain)
&lt;/h3&gt;

&lt;p&gt;This agent handles the business logic. It doesn’t care about pixels. It inputs a user request, decides which tool to use (e.g., vote_feature, add_comment), and outputs structured data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# agent/agent.py
controller_agent = agents.Agent(
    name="controller_agent",
    model=configs.AGENT_MODEL,
    description="Executes application logic.",
    instruction=prompts.CONTROLLER_INSTRUCTION,
    tools=[
        tools.list_features,
        tools.add_feature,
        tools.upvote_feature,
        tools.add_comment,
        tools.get_feature,
        tools.update_feature,
        tools.delete_feature,

    ],
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The View Agent (The Renderer)
&lt;/h3&gt;

&lt;p&gt;This agent is the designer. It takes the data from the Controller and translates it into A2UI JSON. It cares about layout, typography, and hierarchy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;view_agent = agents.Agent(
    name="view_agent",
    model=configs.AGENT_MODEL,
    description="Formats data into A2UI schema.",
    instruction=prompts.VIEW_INSTRUCTION,
    output_schema=schemas.A2UI,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. The Sequential Pipeline
&lt;/h3&gt;

&lt;p&gt;I chained them together using ADK’s SequentialAgent. This simple composition gave me immense flexibility. I could swap the View Agent to change the entire look and feel of the app without touching a single line of business logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root_agent = agents.SequentialAgent(
    name="feature_request_agent",
    description="Handles feature requests from users.",
    sub_agents=[controller_agent, view_agent],
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strengths of the Protocol
&lt;/h3&gt;

&lt;p&gt;Working with A2UI revealed several advantages over traditional chatbot approaches.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2At76cjUYm8DK6ScKKQ8bJHA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2At76cjUYm8DK6ScKKQ8bJHA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Leave comment, with a form rendered dynamically with A2UI (Source: Fmind.dev)&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. UI Language
&lt;/h3&gt;

&lt;p&gt;We are used to agents speaking Markdown. Markdown is fantastic for content: paragraphs, lists, and code blocks. But it fails when you need &lt;em&gt;interaction&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If an agent needs to ask for a complex set of preferences, Markdown forces it to ask one question at a time or parse a messy natural language blob. A2UI allows the agent to project a form with validation, sliders for precise values, and date pickers. It elevates the agent from a “writer” to an “interface designer,” matching the right interaction model to the user’s intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Security by Design
&lt;/h3&gt;

&lt;p&gt;This is the enterprise killer feature. Because A2UI is &lt;strong&gt;data&lt;/strong&gt; , not code, there is no eval() happening on the client. The agent selects from a catalog of safe, pre-built components. You can’t inject malicious scripts via A2UI, making it safe for production environments where “generated code” is a security nightmare.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Progressive Rendering
&lt;/h3&gt;

&lt;p&gt;A2UI is designed to be streamed. As the LLM generates the JSON tokens, the UI builds itself on the screen.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, the container appears.&lt;/li&gt;
&lt;li&gt;Then, the title.&lt;/li&gt;
&lt;li&gt;Then, the list items one by one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the application feel incredibly responsive, masking some latency with visible progress.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Interoperability
&lt;/h3&gt;

&lt;p&gt;Because the UI is just JSON, the exact same agent response can be rendered natively on the web (via &lt;a href="https://lit.dev/" rel="noopener noreferrer"&gt;Lit&lt;/a&gt;), on mobile (via &lt;a href="https://docs.flutter.dev/ai/genui" rel="noopener noreferrer"&gt;Flutter&lt;/a&gt;), or on iOS (via Swift). You build the agent intelligence once, and it projects natively everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Client Style Control
&lt;/h3&gt;

&lt;p&gt;With A2UI, the agent is responsible for the &lt;em&gt;structure&lt;/em&gt; (intent), but the client is completely in control of the &lt;em&gt;style&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The agent says “I need a primary button.” It doesn’t say “I need a blue button with 4px border radius.” This means your application maintains perfect brand consistency. The same agent response can look like a sleek consumer app on Android or a dense dashboard on the web, simply by changing the client-side theme.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Maturity &amp;amp; Boilerplate
&lt;/h3&gt;

&lt;p&gt;A2UI is a new protocol. Building a custom app from scratch currently requires significant boilerplate code. For now, the best approach is to use packages like Flutter’s &lt;a href="https://docs.flutter.dev/ai/genui" rel="noopener noreferrer"&gt;GenUI SDK&lt;/a&gt; or wait for higher-level integrations from ADK or Gemini Enterprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Latency vs. Smartness
&lt;/h3&gt;

&lt;p&gt;Another major challenge is &lt;strong&gt;latency&lt;/strong&gt;. Generating UI tokens takes time and money. While streaming and using “Fast Planners” (like I did in Featest) mitigate this, a pure agentic experience will never beat a hand-optimized native app for core, repetitive tasks. The “smartness” of the dynamic UI must outweigh the latency cost — if it doesn’t, just use a static button.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Complementary Nature
&lt;/h3&gt;

&lt;p&gt;I also found that not every use case benefits from dynamic UI. For the core voting interaction in Featest, a static, predictable UI was simpler and faster. Where A2UI shines is for &lt;em&gt;augmenting&lt;/em&gt; the experience, helping users rationalize features, tag duplicates, or explore trends through conversation. In this project, it’s a powerful &lt;strong&gt;complement&lt;/strong&gt; to a baseline UI, not necessarily a replacement&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;A2UI is a fantastic protocol, but it’s not a silver bullet.&lt;/p&gt;

&lt;p&gt;In my case, I initially thought a pure “AI-First” app would be the ideal experience. However, I learned that for basic, repetitive tasks, it’s simply too slow compared to a traditional interface. The latency of generating UI on the fly doesn’t always pay off.&lt;/p&gt;

&lt;p&gt;The ideal approach for this project is a &lt;strong&gt;hybrid model&lt;/strong&gt; : mixing static, highly-optimized UIs for core workflows with dynamic, agentic components for complex, intent-driven tasks. It is up to the programmer to find the best trade-off for each specific use case.&lt;/p&gt;

&lt;p&gt;However, for &lt;strong&gt;chatbot-focused applications&lt;/strong&gt; , this solution could be highly valuable. It enables the creation of much richer UIs exactly when needed, allowing the experience to go beyond simple text and adapters.&lt;/p&gt;

&lt;p&gt;One thing is sure: there will be more and more agentic features, and A2UI will be a great bridge between agent power and user needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ArfjwvYSIg-kc9N-dyUEA7w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ArfjwvYSIg-kc9N-dyUEA7w.png" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>llm</category>
      <category>frontend</category>
      <category>artificialintelligen</category>
    </item>
    <item>
      <title>Finding the Holy Grail of AI Agent UIs: From AI-Orchestrated Development to A2UI</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Sat, 24 Jan 2026 14:46:32 +0000</pubDate>
      <link>https://forem.com/fmind/finding-the-holy-grail-of-ai-agent-uis-from-ai-orchestrated-development-to-a2ui-52bb</link>
      <guid>https://forem.com/fmind/finding-the-holy-grail-of-ai-agent-uis-from-ai-orchestrated-development-to-a2ui-52bb</guid>
      <description>&lt;p&gt;In my &lt;a href="https://fmind.medium.com/the-real-ai-agent-bottleneck-is-the-damn-ui-90e90ee369e0" rel="noopener noreferrer"&gt;previous article&lt;/a&gt;, I argued that the real bottleneck for AI agents is the User Interface (UI). We are stapling rocket engines to bicycles by forcing advanced agents to communicate through basic markdown chatbots.&lt;/p&gt;

&lt;p&gt;Since then, I’ve been on a journey to find the solution. I didn’t want just a theoretical answer; I wanted to build it. I explored everything from “AI-Orchestrated Development” to Python wrappers, up to new AI protocols, searching for a scalable way to give Agents a native, rich, and dynamic interface.&lt;/p&gt;

&lt;p&gt;I dedicated time to building a concrete implementation to verify my hypotheses. Here is what I found, what failed, and why I believe &lt;a href="https://a2ui.org/" rel="noopener noreferrer"&gt;&lt;strong&gt;A2UI&lt;/strong&gt;&lt;/a&gt; is the protocol we’ve been waiting for to solve this problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2eajdwt75assbhpup04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2eajdwt75assbhpup04.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Exploration: A Graveyard of “Almost” Solutions
&lt;/h3&gt;

&lt;p&gt;My goal was simple: &lt;strong&gt;Build a custom frontend for an agent application without spending weeks on boilerplate.&lt;/strong&gt; I tried multiple approaches, and most of them hit a wall.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The “Heavy” Approach: Angular &amp;amp; Flutter
&lt;/h3&gt;

&lt;p&gt;My first instinct was to build a real app. I tried both &lt;a href="https://angular.dev/" rel="noopener noreferrer"&gt;Angular&lt;/a&gt; and &lt;a href="https://flutter.dev/" rel="noopener noreferrer"&gt;Flutter&lt;/a&gt;. These are standards for enterprise application development, offering robust ecosystems and pixel-perfect control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt; It works, but at what cost? In 2026, setting up a full frontend project is still painful. You have to configure build tools, set up linters, manage complex state stores (Redux, Bloc), and synchronize data models with your backend. This overhead is acceptable for a static, long-term product like a banking dashboard, but for a dynamic Agent? &lt;strong&gt;It’s overkill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Agents need to be able to transmit their UI and adapt on the fly. Hardcoding a heavy client defeats the purpose of an autonomous agent. If every new agent capability requires a sprint of frontend changes, the agent isn’t truly autonomous. It’s just a backend API with a very expensive chat interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. “AI-Orchestrated Development” (AI-Generated UIs)
&lt;/h3&gt;

&lt;p&gt;I tried what I call “ &lt;strong&gt;AI-Orchestrated Development&lt;/strong&gt; ”: a more structured approach where the AI is front and center in generating application code, popularized in early 2026 by tools like &lt;a href="https://githubnext.com/projects/copilot-workspace" rel="noopener noreferrer"&gt;GitHub Spec Kit&lt;/a&gt;, &lt;a href="https://deepmind.google/technologies/gemini/conductor/" rel="noopener noreferrer"&gt;Gemini Conductor&lt;/a&gt;, or &lt;a href="https://blog.google/technology/google-labs/antigravity/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt;. This is distinct from “vibe coding” (using AI intuitively without understanding the output). AI-Orchestrated Development aims for a systematic process where AI handles implementation under developer guidance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Verdict:&lt;/strong&gt; While promising long-term, it still generates &lt;em&gt;lots&lt;/em&gt; of code. Code that you have to maintain, test, and debug. And I’m not confident in either maintaining AI-generated codebases or letting AI be the sole responsible party for production systems.&lt;/p&gt;

&lt;p&gt;We already spend more time on application maintenance than building. AI-Orchestrated Development risks accelerating this accumulation. We need to reduce the amount of specific code generated, not increase it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. HTMX: The Backend-Driven UI
&lt;/h3&gt;

&lt;p&gt;I went back to my roots (PHP/AJAX) and tried &lt;a href="https://htmx.org/" rel="noopener noreferrer"&gt;HTMX&lt;/a&gt;. It’s a productive methodology that keeps logic in one place by streaming HTML fragments from the server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; HTMX couples the agent too tightly to a specific visual implementation. If you want to render the same agent response on a mobile app, a web dashboard, and a desktop client, you can’t reuse the HTML stream — you’re locked into one presentation layer.&lt;/p&gt;

&lt;p&gt;More fundamentally, HTML is too low-level for an agent to reason about. An agent shouldn’t be worrying about CSS classes, DOM nesting, or accessibility attributes. It should focus on &lt;em&gt;intent&lt;/em&gt; and &lt;em&gt;logic&lt;/em&gt;, not pixels. Sending declarative data is more efficient, more universal, and can be consumed by different types of clients.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Python Wrappers (Streamlit, Gradio, Chainlit)
&lt;/h3&gt;

&lt;p&gt;These are great for prototypes. Tools like &lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt;, &lt;a href="https://gradio.app/" rel="noopener noreferrer"&gt;Gradio&lt;/a&gt;, and &lt;a href="https://chainlit.io/" rel="noopener noreferrer"&gt;Chainlit&lt;/a&gt; offer a small code surface and instant deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Flaw:&lt;/strong&gt; The “Glue Code” Hell. You inevitably hit a wall where the library doesn’t support the specific interaction or component you need. Maybe you need a custom drag-and-drop interface or a specific data visualization. You lose control over style, and you end up writing hacky workarounds (custom HTML injection, iframe bridges) to connect the agent’s state to the UI components.&lt;/p&gt;

&lt;p&gt;They are also not truly dynamic — they are rigid templates filled with data, not fluid interfaces generated by the agent’s needs. You are still building a form; you are just doing it in Python instead of React.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Chat Extensions (Slack/Teams/Workspace)
&lt;/h3&gt;

&lt;p&gt;Building into existing workflows seems smart. Why build a new UI when you can just deploy a bot to Slack or Google Chat?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Limit:&lt;/strong&gt; It doesn’t scale. You end up building a specific adapter for Slack, another for Teams, another for Google Chat. Each platform has its own proprietary UI kit (Block Kit, Adaptive Cards) with different limitations.&lt;/p&gt;

&lt;p&gt;You want to build your agent &lt;em&gt;once&lt;/em&gt; and have it project its UI anywhere, not rewrite the presentation layer for every host app. This fragmentation increases the maintenance burden and prevents you from creating a consistent user experience across platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Epiphany: Separation of Concerns
&lt;/h3&gt;

&lt;p&gt;I realized something fundamental during this process: &lt;strong&gt;Everything is disposable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We shouldn’t be precious about the UI code. We should focus on the &lt;strong&gt;declarative&lt;/strong&gt; side. Just as humans use HTML not because we love drawing pixels, but because we want to say “Here is a link” or “Here is an image,” agents need a high-level language to describe &lt;em&gt;what&lt;/em&gt; needs to be shown, not &lt;em&gt;how&lt;/em&gt; to draw it.&lt;/p&gt;

&lt;p&gt;The Agent should be responsible for the &lt;strong&gt;Data&lt;/strong&gt; and the &lt;strong&gt;Logic&lt;/strong&gt;. The Client should be responsible for the &lt;strong&gt;Style&lt;/strong&gt; and the &lt;strong&gt;Rendering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This separation allows the agent to be “brain-heavy” and “UI-light,” deferring the complex rendering logic to the client, which is what clients are best at.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: A2UI (Agent-to-User Interface)
&lt;/h3&gt;

&lt;p&gt;Enter &lt;a href="https://github.com/google/A2UI" rel="noopener noreferrer"&gt;&lt;strong&gt;A2UI&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I built a demo app using this protocol, and I was genuinely impressed by its elegance. A2UI is a &lt;strong&gt;JSONL-based declarative protocol&lt;/strong&gt; that creates a standard contract between the AI and the user interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Instead of streaming markdown tokens like a traditional LLM, the agent streams structured JSON objects representing UI components.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AG_S9eAr0Y6DQ0kTq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AG_S9eAr0Y6DQ0kTq.png" width="800" height="433"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: &lt;a href="https://a2ui.org/" rel="noopener noreferrer"&gt;https://a2ui.org/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The client can use the &lt;a href="https://github.com/google/A2UI/tree/main/renderers/lit" rel="noopener noreferrer"&gt;Lit renderer&lt;/a&gt;, &lt;a href="https://github.com/google/A2UI/tree/main/renderers/angular" rel="noopener noreferrer"&gt;Angular renderer&lt;/a&gt;, or &lt;a href="https://docs.flutter.dev/ai/genui" rel="noopener noreferrer"&gt;Flutter renderer&lt;/a&gt; to render native components progressively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it wins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production-Ready at Google:&lt;/strong&gt; A2UI isn’t vaporware — it’s already integrated into Google products like &lt;a href="https://labs.google/" rel="noopener noreferrer"&gt;Opal&lt;/a&gt;, &lt;a href="https://cloud.google.com/gemini/enterprise" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt;, and the &lt;a href="https://docs.flutter.dev/ai/genui" rel="noopener noreferrer"&gt;Flutter GenUI SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport Agnostic:&lt;/strong&gt; It works over HTTP (via the &lt;a href="https://github.com/google/A2A" rel="noopener noreferrer"&gt;A2A protocol&lt;/a&gt;), WebSockets, or carrier pigeons. The protocol doesn’t care how the JSON gets there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive Rendering:&lt;/strong&gt; The UI appears as the agent “thinks” it. Components stream in one by one, making the interface feel alive and responsive, much like text streaming but for rich UI elements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework Agnostic:&lt;/strong&gt; The client implementation (React, Angular, Lit) decides how a “Card” looks. The agent just says “I need a Card”. This means you can have a “Material Design” client and an “iOS Cupertino” client rendering the exact same agent response natively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure:&lt;/strong&gt; No arbitrary JavaScript execution. It’s just declarative data, mitigating injection risks. This is critical for enterprise adoption where security reviews block “dynamic code generation.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-Friendly:&lt;/strong&gt; Flat, streaming JSON structure designed for easy generation. LLMs can build UIs incrementally without perfect JSON in one shot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; A2UI is currently at &lt;a href="https://a2ui.org/specification/v0.8-a2ui/" rel="noopener noreferrer"&gt;v0.8&lt;/a&gt; and still in active development. The protocol has some rough edges, and for production use. The best approach is to wait for native integration in tools like &lt;a href="https://cloud.google.com/gemini/enterprise" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt; or the &lt;a href="https://google.github.io/adk-docs" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A2UI vs AG-UI: Two Philosophies
&lt;/h3&gt;

&lt;p&gt;I also looked at &lt;a href="https://ag-ui.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;AG-UI&lt;/strong&gt;&lt;/a&gt;, another emerging standard in this space.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AG-UI&lt;/strong&gt; aims to blend the frontend and backend deeply, creating “AI-First” apps from the ground up with a focus on real-time event loops. It’s powerful but requires you to rethink your entire application architecture regarding state synchronization and event handling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A2UI&lt;/strong&gt; focuses on &lt;em&gt;extending&lt;/em&gt; the capabilities of chat-based interaction to be richer. It’s a bridge that lets agents “speak UI” using standard components. It feels more like an evolution of the chat interface into a command center rather than a complete replacement of the application stack.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I believe &lt;strong&gt;A2UI&lt;/strong&gt; is the scalable path forward for most agent implementations. It respects the separation of concerns and integrates seamlessly with existing systems via protocols like A2A (Agent-to-Agent).&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion: The 2026 Shift
&lt;/h3&gt;

&lt;p&gt;We are moving towards a schism in frontend technology, and it’s happening faster than we think:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Static Apps (the stock):&lt;/strong&gt; Dashboards, retail sites, and specialized tools. These will still be built with efficient frameworks for speed, precise control, and specific user journeys where the path is known. They represent the bulk of existing applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Agent Interfaces (the flow):&lt;/strong&gt; Powered by new protocols like &lt;strong&gt;A2UI&lt;/strong&gt;. These will replace the “Chatbot” with something far more powerful — interactive, component-based, and generated on the fly. This is where the new growth is happening. These interfaces will emerge when the user’s intent is ambiguous or highly variable, like in &lt;a href="https://cloud.google.com/transform/a-new-era-agentic-commerce-retail-ai" rel="noopener noreferrer"&gt;Agentic Commerce&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I am convinced that 2026 is the year we stop building UIs &lt;em&gt;for&lt;/em&gt; agents and start letting agents &lt;em&gt;project&lt;/em&gt; their UIs to us. We shouldn’t spend too much time on UI. Let it be personalized by the agent so we can focus on what truly matters: integration and instruction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In the next article, I will share the source code and a full demo of the application I built using A2UI. Stay tuned!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ASyT4vdEjI7fJIoj-U9B8VA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ASyT4vdEjI7fJIoj-U9B8VA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>artificialintelligen</category>
      <category>softwaredevelopment</category>
      <category>ui</category>
    </item>
    <item>
      <title>Architecting the AI Agent Platform: A Definitive Guide</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Wed, 10 Dec 2025 07:20:36 +0000</pubDate>
      <link>https://forem.com/fmind/architecting-the-ai-agent-platform-a-definitive-guide-59oo</link>
      <guid>https://forem.com/fmind/architecting-the-ai-agent-platform-a-definitive-guide-59oo</guid>
      <description>&lt;p&gt;The velocity of Generative AI has been nothing short of relentless. In the span of just 24 months, the industry has shifted paradigms three times. We started with the raw capability of &lt;strong&gt;LLMs&lt;/strong&gt; (the “prompt engineering” era). We quickly moved to &lt;strong&gt;RAG&lt;/strong&gt; (Retrieval-Augmented Generation) to ground those models in enterprise data. Now, we are at the era of &lt;strong&gt;AI Agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We are no longer asking models to simply talk or retrieve; we are asking them to &lt;em&gt;do&lt;/em&gt;. We are building systems capable of reasoning, planning, and executing actions to change the state of the world.&lt;/p&gt;

&lt;p&gt;Building a single agent in a notebook is easy. Building a system that serves, secures, and monitors thousands of autonomous agents across an enterprise is an entirely different engineering challenge. To deliver robust solutions with tangible ROI, you cannot rely on scattered Proofs of Concept. You need a factory. You need an &lt;strong&gt;AI Agent Platform&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this guide, I will deconstruct the architecture of a production-grade AI Agent Platform, breaking it down into its system context, containers, and component layers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5grlrxi093lt2vfdyywb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5grlrxi093lt2vfdyywb.png" width="800" height="427"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini (Nano Banana Pro)&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  System Context: The PaaS Approach
&lt;/h3&gt;

&lt;p&gt;At its core, the AI Agent Platform is a &lt;strong&gt;Platform-as-a-Service (PaaS)&lt;/strong&gt; designed to build, serve, and expose AI agents.&lt;/p&gt;

&lt;p&gt;Unlike &lt;strong&gt;AI Agent SaaS&lt;/strong&gt; solutions — which lock you into a closed ecosystem and a predefined set of integrations — an AI Agent Platform is designed for &lt;strong&gt;extensibility&lt;/strong&gt; and &lt;strong&gt;control&lt;/strong&gt;. SaaS solutions are excellent for quick wins, but they often lack the ability to support custom logic or complex enterprise workflows.&lt;/p&gt;

&lt;p&gt;Crucially, an internal AI Agent Platform allows you to enforce &lt;strong&gt;SRE (Site Reliability Engineering)&lt;/strong&gt; practices. If an agent fails, your Ops team can intervene. If an agent attempts an unauthorized action, your Security team has the audit trails to investigate and harden the perimeter.&lt;/p&gt;

&lt;p&gt;The platform serves two distinct types of builders:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Programmer (Code-Based):&lt;/strong&gt; Engineers requiring power and flexibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Integrator (No/Low-Code):&lt;/strong&gt; Business analysts requiring speed and ease of configuration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It must also be accessible to &lt;strong&gt;External Systems&lt;/strong&gt; (Machine-to-Machine) via standard APIs like REST or gRPC. This allows other systems to offload cognitive tasks — like “analyze this log file” or “classify this ticket” — to your agent fleet programmatically.&lt;/p&gt;

&lt;p&gt;To function, the AI Agent Platform relies on five high-level systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity &amp;amp; Access:&lt;/strong&gt; The gatekeeper for users, agents, and data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundation Models:&lt;/strong&gt; The cognitive “brain” (reasoning, planning, and instruction following).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Apps &amp;amp; APIs:&lt;/strong&gt; The “hands” of the agent (e.g., Jira, Salesforce, SAP, SQL, …).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Information Systems:&lt;/strong&gt; The context providers (Operational DBs, Data Lakes, Knowledge Bases).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Infrastructure:&lt;/strong&gt; The bedrock providing compute and reliability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A6XhH0m-926J2kKvIonWuOw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A6XhH0m-926J2kKvIonWuOw.png" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Container Architecture
&lt;/h3&gt;

&lt;p&gt;To manage complexity, we divide the AI Agent Platform into &lt;strong&gt;7 Logical Containers&lt;/strong&gt;. This separation of concerns is vital for security auditing and independent scaling.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Interaction:&lt;/strong&gt; The frontend where users meet agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development:&lt;/strong&gt; The workbench for building and deploying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core:&lt;/strong&gt; The runtime engine that executes logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundation:&lt;/strong&gt; The infrastructure abstraction for models and compute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Information:&lt;/strong&gt; The data layer managing context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; The monitoring and evaluation stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust:&lt;/strong&gt; The security and governance control plane.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AIu6ICfpuQv0QsGx1mXGPUA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AIu6ICfpuQv0QsGx1mXGPUA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s hack through these layers one by one.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Interaction
&lt;/h3&gt;

&lt;p&gt;The Interaction layer is the portal. It is where the carbon lifeforms (us) communicate with the silicon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ABSCdwkBwsOSm_AqXsC-Pqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2ABSCdwkBwsOSm_AqXsC-Pqg.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are three primary ways to expose your agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard Chatbot:&lt;/strong&gt; The familiar conversational interface. It is fast to ship and often requires zero frontend skills. However, it is a generic instrument; chat is not always the best interface for complex user experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom User Interface:&lt;/strong&gt; Bespoke web or mobile apps. This is where the power lies. As I’ve argued before, &lt;a href="https://fmind.medium.com/the-real-ai-agent-bottleneck-is-the-damn-ui-90e90ee369e0" rel="noopener noreferrer"&gt;the UI is often the real bottleneck for agents&lt;/a&gt;. Custom UIs allow for rich interactions, but they come with a “frontend tax” — they are time-consuming to build.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External Channels:&lt;/strong&gt; Extending the platform to meet users where they are — SMS, Email, Voice, or Slack. This is critical for field workers or remote teams who don’t sit in front of a dashboard all day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the future, I expect &lt;strong&gt;Generative UI&lt;/strong&gt; to take over by 2026. This is where the agent generates dynamic interface elements on the fly based on user intent (see &lt;a href="https://research.google/blog/generative-ui-a-rich-custom-visual-interactive-user-experience-for-any-prompt/" rel="noopener noreferrer"&gt;Google Research&lt;/a&gt;). In the meantime, we must trade between options.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Development
&lt;/h3&gt;

&lt;p&gt;This is the factory floor. My experience shows a &lt;strong&gt;50/50 split&lt;/strong&gt; between developers (code-based) and integrators (no/low code), so your platform must support both paths to avoid limiting speed or flexibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AI7PcI6Aiw9P54cM3gBEiQA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AI7PcI6Aiw9P54cM3gBEiQA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Code-Based (The Developer Path)
&lt;/h4&gt;

&lt;p&gt;This path is for engineers using frameworks like &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, or &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Google ADK&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Stack:&lt;/strong&gt; Code is versioned in SCM (Git), tested via CI/CD, and deployed as software artifacts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Cost:&lt;/strong&gt; Surprisingly low. With “Model-as-a-Service,” developers can build robust agents on a laptop or Cloud Workstation for pennies per day. You don’t need a local H100 cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  No-Code (The Integrator Path)
&lt;/h4&gt;

&lt;p&gt;This path is for business analysts using Visual Builders and iPaaS (&lt;a href="https://cloud.google.com/application-integration" rel="noopener noreferrer"&gt;Integration Platform as a Service&lt;/a&gt;) tools.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Stack:&lt;/strong&gt; Visual designers, drag-and-drop workflows, and pre-built connectors for building AI Agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Trade-off:&lt;/strong&gt; Speed vs. Flexibility. It is the fastest way to prototype and connect to enterprise apps, but visual design can be less robust and more limiting than pure code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Core
&lt;/h3&gt;

&lt;p&gt;The Core is the heartbeat. It houses the &lt;strong&gt;Execution Engine&lt;/strong&gt; , the runtime responsible for the agent’s cognitive loop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Awkd6kVpE8_5mtvBdVACmrA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Awkd6kVpE8_5mtvBdVACmrA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  The Execution Engine
&lt;/h4&gt;

&lt;p&gt;To be truly autonomous, the runtime needs specific capabilities that ease development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session Management:&lt;/strong&gt; Persisting state across conversational turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Bank:&lt;/strong&gt; Handling short-term context and long-term recall.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Sandbox:&lt;/strong&gt; A secure environment (like a micro VM) where the agent can write and execute code safely to solve math or data problems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Gateways &amp;amp; Orchestration
&lt;/h4&gt;

&lt;p&gt;You don’t always need a heavy Airflow setup with DAGs, but you do need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task Schedulers / Event Buses:&lt;/strong&gt; To trigger agents asynchronously (e.g., “New Ticket Created” -&amp;gt; “Wake up Triage Agent”).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Management:&lt;/strong&gt; Exposing agents via standard Gateways like &lt;a href="https://cloud.google.com/apigee?authuser=1" rel="noopener noreferrer"&gt;Apigee&lt;/a&gt; or &lt;a href="https://www.gravitee.io/" rel="noopener noreferrer"&gt;Gravitee&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Standardization is Key:&lt;/strong&gt; Practitioners are heavily encouraged to adopt standards like &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://a2aprotocol.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;A2A (Agent-to-Agent)&lt;/strong&gt;&lt;/a&gt; interfaces. Your platform cannot be an island; it must act as a network where your agents can call tools or even &lt;em&gt;other&lt;/em&gt; agents to complete complex tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Foundation
&lt;/h3&gt;

&lt;p&gt;The Foundation layer is the bedrock of the AI Agent Platforms, providing both Foundation Models and Infrastructure solutions to the agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AOettEDYxiTAHN1A4zJ85MQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AOettEDYxiTAHN1A4zJ85MQ.png" width="800" height="171"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Model Strategy
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serving:&lt;/strong&gt; You will likely mix &lt;strong&gt;Model-as-a-Service&lt;/strong&gt; (Vertex AI, Bedrock, …) for ease of use and scalability, and &lt;strong&gt;Custom Model Hosting&lt;/strong&gt; for specific, fine-tuned, or private models that require more operational effort.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Routing:&lt;/strong&gt; Don’t default to the most expensive model. Use a router to dispatch simple queries to cheaper/faster models and complex reasoning to “smart” models (e.g., Gemini 1.5 Pro, Claude 3.5 Sonnet, GPT-4).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Caching:&lt;/strong&gt; A massive cost saver. Cache system instructions and heavy documents so you aren’t paying to re-tokenize your company handbook on every request.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Infrastructure
&lt;/h4&gt;

&lt;p&gt;Standard cloud primitives apply here. Compute, Blob Storage, and &lt;strong&gt;Artifact Management&lt;/strong&gt; (for abstracting the agent storage of input/output files) are essential. Treat your Agent Infrastructure as Code (IaC) to ensure reproducibility across environments (AWS, GCP, Azure, or on-premise).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Information
&lt;/h3&gt;

&lt;p&gt;An agent without data is a hallucination machine. The Information layer feeds the context required for decision-making.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AAkYWeo8wQsZ17sSyrLSowQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AAkYWeo8wQsZ17sSyrLSowQ.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge (Unstructured):&lt;/strong&gt; Documentation and guidelines stored in shared drives or online websites. These are typically indexed by a &lt;strong&gt;RAG Engine&lt;/strong&gt; or &lt;strong&gt;Search Engine&lt;/strong&gt; to explain &lt;em&gt;how&lt;/em&gt; the company works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational (Structured):&lt;/strong&gt; Transactional data (SQL DBs) required to &lt;em&gt;do&lt;/em&gt; work (e.g., update a CRM record). Builders should favor APIs over direct DB access here to ensure business logic integrity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Lake (Analytical):&lt;/strong&gt; Historical data for insights and decision making. Requires a Semantic Layer and Data Catalog so the agent understands what “Revenue” actually means before running a query.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Sync Problem:&lt;/strong&gt; Syncing these systems is painful. Each sync risks data duplication and inconsistency. We are moving toward a convergence of OLAP and OLTP with systems like &lt;a href="https://cloud.google.com/products/alloydb?authuser=1" rel="noopener noreferrer"&gt;Google AlloyDB&lt;/a&gt; or &lt;a href="https://www.databricks.com/product/lakebase" rel="noopener noreferrer"&gt;Databricks Lakebase&lt;/a&gt; to eliminate the copy/desync nightmare.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Observability
&lt;/h3&gt;

&lt;p&gt;If there is one thing humans must remain in control of, it is &lt;strong&gt;supervising the agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Aq5noL2yr9jrvr7WqMbAgvA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Aq5noL2yr9jrvr7WqMbAgvA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supervision:&lt;/strong&gt; The entry point. You need to collect logs, traces, and audit trails. Alerts should notify operators immediately when an agent loops or fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation:&lt;/strong&gt; The hardest part. You need pipelines where Foundation Models (or humans) review agent traces to score them on metrics such as &lt;strong&gt;Factuality, Relevance, and Accuracy&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Billing:&lt;/strong&gt; FinOps for AI. Track token usage per department. This is especially important for new architectures with less familiar cost sinks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics:&lt;/strong&gt; Tracking adoption. Is the agent actually solving tickets, or are people ignoring it? This is key to reporting ROI to stakeholders.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Trust
&lt;/h3&gt;

&lt;p&gt;Finally, the Trust layer. Agents are high-leverage tools; without governance, they are a liability that could create havoc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Ah7XffR1NKmMduJsTNcymBQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Ah7XffR1NKmMduJsTNcymBQ.png" width="800" height="557"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Fmind.dev&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IAM (Identity &amp;amp; Access Management):&lt;/strong&gt; RBAC is mandatory. Furthermore, &lt;strong&gt;Tool Authentication&lt;/strong&gt; (OIDC/OAuth) ensures the agent only takes actions the &lt;em&gt;user&lt;/em&gt; is authorized to take (acting on behalf of the user).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Guardrails. You need to filter content, prevent &lt;strong&gt;Prompt Injection&lt;/strong&gt; (jailbreaks), and detect malicious content &lt;em&gt;before&lt;/em&gt; the LLM sees it. &lt;strong&gt;Secret Management&lt;/strong&gt; is also critical to protect secrets like API keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance:&lt;/strong&gt; The Registry. You need a central catalog of authorized agents, models, and tools. You don’t want to be hunting through the org chart to find out who built the “Payroll Bot” or who is responsible for a rogue agent. This can extend to a &lt;strong&gt;Marketplace&lt;/strong&gt; for buying assets from other vendors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Building an AI Agent Platform is not just about stringing together a few API calls. It is about building a scalable, secure, and observable ecosystem where code and reasoning merge to drive real business impacts. I’m really excited to build these powerhouse of automation and intelligence!&lt;/p&gt;

&lt;p&gt;Whether you are a developer writing complex orchestration logic or an integrator dragging and dropping workflows, the platform provides the stability you need to move from “demo” to “production”. The challenge will be immense, but if you have the right vision, roadmap and architecture, solutions will appear layer by layer to start addressing your use cases.&lt;/p&gt;

&lt;p&gt;Start with the core, secure the trust layer, and never underestimate the importance of observability. The agents are coming — make sure you have the platform to manage them and give them both power and control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A1Q_m_WP8O_PFLomdGaYGqQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2A1Q_m_WP8O_PFLomdGaYGqQ.png" width="800" height="427"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini (Nano Banana Pro)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>cloudcomputing</category>
      <category>artificialintelligen</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Powering Up your Agent in Production with ADK, OAuth and Gemini Enterprise</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Sat, 01 Nov 2025 17:37:03 +0000</pubDate>
      <link>https://forem.com/fmind/powering-up-your-agent-in-production-with-adk-oauth-and-gemini-enterprise-2mi1</link>
      <guid>https://forem.com/fmind/powering-up-your-agent-in-production-with-adk-oauth-and-gemini-enterprise-2mi1</guid>
      <description>&lt;p&gt;The promise of AI agents is immense productivity gains. But putting them into production can be a tale of two extremes: surprisingly fast or painfully slow.&lt;/p&gt;

&lt;p&gt;The difference often hinges on the infrastructure and tooling you choose. If you attempt to build everything from scratch — creating a custom UI, managing complex authentication flows, and setting up observability — development slows down significantly. You spend more time on infrastructure than on the agent logic itself. I recently argued this point in &lt;a href="https://fmind.medium.com/the-real-ai-agent-bottleneck-is-the-damn-ui-90e90ee369e0" rel="noopener noreferrer"&gt;“The Real AI Agent Bottleneck is the Damn UI”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, with the right tools, deploying an agent can be remarkably quick.&lt;/p&gt;

&lt;p&gt;To demonstrate how to achieve this fast route, we need a practical example. A while ago, I shared a &lt;a href="https://fmind.medium.com/slides-to-translate-when-it-says-no-build-a-0-04-solution-on-your-lunch-break-3afa8bd9f6bb" rel="noopener noreferrer"&gt;notebook built over lunch to translate Google Slides&lt;/a&gt;. It was effective but stuck in a notebook, inaccessible to my teammates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdd5bpjj2st7tuht07ns3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdd5bpjj2st7tuht07ns3.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A hacker tinkering a robot (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This article details the journey of taking that “Slides Translator” and pushing it into production as a secure, scalable agent, leveraging the right stack to bypass the usual bottlenecks. We will focus on using the &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;, &lt;a href="https://oauth.net/2/" rel="noopener noreferrer"&gt;OAuth&lt;/a&gt;, and &lt;a href="https://cloud.google.com/gemini-enterprise?hl=en" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The full code for this project is available on GitHub: &lt;a href="https://github.com/fmind/slides-translator-agent" rel="noopener noreferrer"&gt;https://github.com/fmind/slides-translator-agent&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Agentic Architecture
&lt;/h3&gt;

&lt;p&gt;To move from a notebook to a production agent, we need an architecture that handles security, execution, and user access robustly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Af5__WLm_DfbCtPXn43bbzA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Af5__WLm_DfbCtPXn43bbzA.png" width="800" height="438"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture of the Slide Translator Agent from local development to production deployment (Source: fmind.dev)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The workflow is structured as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Local Development:&lt;/strong&gt; The agent logic is developed using the &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt; and tested locally via the &lt;a href="https://github.com/google/adk-web" rel="noopener noreferrer"&gt;ADK Web UI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; The agent is deployed to the &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI Agent Engine&lt;/strong&gt;&lt;/a&gt; on Google Cloud Platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Access:&lt;/strong&gt; Users interact with the agent through the &lt;a href="https://cloud.google.com/gemini-enterprise?hl=en" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini Enterprise Web UI&lt;/strong&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution and Security:&lt;/strong&gt; The Agent Engine manages the execution. It uses &lt;a href="https://oauth.net/2/" rel="noopener noreferrer"&gt;&lt;strong&gt;OAuth&lt;/strong&gt;&lt;/a&gt; for secure authorization, interacts with &lt;a href="https://developers.google.com/apis-explorer" rel="noopener noreferrer"&gt;&lt;strong&gt;Google APIs&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Drive and Slides)&lt;/strong&gt; on the user’s behalf, and utilizes &lt;a href="https://ai.google.dev/gemini-api/docs/models" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini Models&lt;/strong&gt;&lt;/a&gt; for the translation.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  ADK and the Power of OAuth
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/GoogleCloudPlatform/agent-development-kit" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt; provides a great set of features to handle everything you need for building agents. In this specific use case, I focused on its ability to handle &lt;strong&gt;OAuth&lt;/strong&gt; , to let the user grant access to their slides and drive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AT_KtkXo8SK0SPTEL.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AT_KtkXo8SK0SPTEL.png" width="800" height="353"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Overview of Google ADK and Vertex AI Agent Engine (Source: &lt;a href="https://cloud.google.com/agent-builder/agent-engine/overview" rel="noopener noreferrer"&gt;https://cloud.google.com/agent-builder/agent-engine/overview&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the notebook prototype, authentication relied on local credentials. This is not suitable for a production agent that needs to access the &lt;em&gt;user’s&lt;/em&gt; specific files. The agent must act on behalf of the user, requiring their explicit permission.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why OAuth?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://oauth.net/2/" rel="noopener noreferrer"&gt;OAuth 2.0&lt;/a&gt; provides excellent security guarantees and granularity. It allows users to grant specific permissions (scopes) without sharing their passwords with the agent. In this case, we need access to the &lt;a href="https://developers.google.com/drive/api?authuser=1" rel="noopener noreferrer"&gt;Google Drive API&lt;/a&gt; (to copy the presentation) and the &lt;a href="https://developers.google.com/slides/api?authuser=1" rel="noopener noreferrer"&gt;Google Slides API&lt;/a&gt; (to read and write slide content).&lt;/p&gt;

&lt;p&gt;While OAuth is not an easy concept to grasp for newcomers, it’s a key component to provide more security in enterprise applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AdQUie6caJY37ewPiXb315g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AdQUie6caJY37ewPiXb315g.png" width="800" height="585"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;OAuth flow for a tool with Google ADK (Source: &lt;a href="https://google.github.io/adk-docs/tools/authentication/" rel="noopener noreferrer"&gt;https://google.github.io/adk-docs/tools/authentication/&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;To make this work, an OAuth Client ID must be configured in the Google Cloud Console: &lt;a href="https://console.cloud.google.com/auth/clients" rel="noopener noreferrer"&gt;https://console.cloud.google.com/auth/clients&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AHtOzRd8iG-ervVtnJQ5dug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AHtOzRd8iG-ervVtnJQ5dug.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Configuration of the OAuth Credentials on Google Cloud: &lt;a href="https://console.cloud.google.com/auth/clients" rel="noopener noreferrer"&gt;https://console.cloud.google.com/auth/clients&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Crucially, we need to define the “Authorized redirect URIs”. The localhost URI is used during local development with the ADK Web UI, and the &lt;a href="https://vertexaisearch.cloud.google.com/oauth-redirect" rel="noopener noreferrer"&gt;https://vertexaisearch.cloud.google.com/oauth-redirect&lt;/a&gt; URI is used by the Vertex AI Agent Engine in production to securely handle the callback after the user grants consent.&lt;/p&gt;
&lt;h3&gt;
  
  
  Implementation in ADK
&lt;/h3&gt;

&lt;p&gt;ADK simplifies the OAuth flow significantly. We define the authentication configuration and use decorators to protect the tools that require user credentials.&lt;/p&gt;

&lt;p&gt;Here is a snippet demonstrating the core authentication mechanism in the agent code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"""Authentication for the tools."""

# %% IMPORTS

from fastapi.openapi.models import OAuth2, OAuthFlowAuthorizationCode, OAuthFlows
from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, OAuth2Auth
from google.adk.auth.auth_tool import AuthConfig

from slides_translator_agent import configs

# %% CONFIGS

AUTHORIZATION_URL = "https://accounts.google.com/o/oauth2/auth"
TOKEN_URL = "https://oauth2.googleapis.com/token"
SCOPES = {
    "https://www.googleapis.com/auth/drive": "Google Drive API",
    "https://www.googleapis.com/auth/presentations": "Google Slides API",
}

# %% AUTHENTICATIONS

AUTH_SCHEME = OAuth2(
    flows=OAuthFlows(
        authorizationCode=OAuthFlowAuthorizationCode(
            authorizationUrl=AUTHORIZATION_URL,
            tokenUrl=TOKEN_URL,
            scopes=SCOPES,
        )
    )
)
AUTH_CREDENTIAL = AuthCredential(
    auth_type=AuthCredentialTypes.OAUTH2,
    oauth2=OAuth2Auth(
        client_id=configs.AUTHENTICATION_CLIENT_ID,
        client_secret=configs.AUTHENTICATION_CLIENT_SECRET,
    ),
)
AUTH_CONFIG = AuthConfig(
    auth_scheme=AUTH_SCHEME,
    raw_auth_credential=AUTH_CREDENTIAL,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the translate_presentation tool is invoked, the negotiate_creds function ensures that a valid token exists. If not, ADK automatically pauses the agent execution and initiates the OAuth flow with the user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"""Tools for the agents."""

import json

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

from slides_translator_agent import auths

def negotiate_creds(tool_context: ToolContext) -&amp;gt; Credentials | dict:
    """Handle the OAuth 2.0 flow to get valid credentials."""
    logger.info("Negotiating credentials using oauth 2.0")
    # Check for cached credentials in the tool state
    if cached_token := tool_context.state.get(configs.TOKEN_CACHE_KEY):
        logger.debug("Found cached token in tool context state")
        if isinstance(cached_token, dict):
            logger.debug("Cached token is a dictionary, treating as AuthCredential.")
            try:
                creds = Credentials.from_authorized_user_info(
                    cached_token, list(auths.SCOPES.keys())
                )
                if creds.valid:
                    logger.debug("Cached credentials are valid, returning credentials")
                    return creds
                if creds.expired and creds.refresh_token:
                    logger.debug("Cached credentials expired, attempting refresh")
                    creds.refresh(Request())
                    tool_context.state[configs.TOKEN_CACHE_KEY] = json.loads(creds.to_json())
                    logger.debug("Credentials refreshed and cached successfully")
                    return creds
            except Exception as error:
                logger.error(f"Error loading/refreshing cached credentials: {error}")
                tool_context.state[configs.TOKEN_CACHE_KEY] = None # reset cache
        elif isinstance(cached_token, str):
            logger.debug("Found raw access token in tool context state.")
            # This creates a temporary credential object from the token
            # Note: This credential will not be refreshed if it expires
            return Credentials(token=cached_token)
        else:
            raise ValueError(
                f"Invalid cached token type. Expected dict or str, got {type(cached_token)}"
            )
    # If no valid cached credentials, check for auth response
    logger.debug("No valid cached token. Checking for auth response")
    if exchanged_creds := tool_context.get_auth_response(auths.AUTH_CONFIG):
        logger.debug("Received auth response, creating credentials")
        auth_scheme = auths.AUTH_CONFIG.auth_scheme
        auth_credential = auths.AUTH_CONFIG.raw_auth_credential
        creds = Credentials(
            token=exchanged_creds.oauth2.access_token,
            refresh_token=exchanged_creds.oauth2.refresh_token,
            token_uri=auth_scheme.flows.authorizationCode.tokenUrl,
            client_id=auth_credential.oauth2.client_id,
            client_secret=auth_credential.oauth2.client_secret,
            scopes=list(auth_scheme.flows.authorizationCode.scopes.keys()),
        )
        tool_context.state[configs.TOKEN_CACHE_KEY] = json.loads(creds.to_json())
        logger.debug("New credentials created and cached successfully")
        return creds
    # If no auth response, initiate auth request
    logger.debug("No credentials available. Requesting user authentication")
    tool_context.request_credential(auths.AUTH_CONFIG)
    logger.info("Awaiting user authentication")
    return {"pending": True, "message": "Awaiting user authentication"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures the user explicitly consents to the agent accessing their files before any action is taken.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AQUMzCNHNvZ7n99wBXF4ocQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AQUMzCNHNvZ7n99wBXF4ocQ.png" width="800" height="335"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;OAuth is supported natively on Google ADK: when needed, ADK will prompt the user to grant more access to the agent tools&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Deploying with Gemini Enterprise
&lt;/h3&gt;

&lt;p&gt;Once the agent is developed and tested, the next step is deploying it to production.&lt;/p&gt;
&lt;h3&gt;
  
  
  Configuring Production Authentication
&lt;/h3&gt;

&lt;p&gt;Before deploying the agent code, we need to register the OAuth configuration with the production environment. I used the following script to set this up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./as.py create-auth \
  --auth-id slides-translator-auth \
  --client-id ... \
  --client-secret ... \
  --auth-uri "https://accounts.google.com/o/oauth2/auth?include_granted_scopes=true&amp;amp;response_type=code&amp;amp;access_type=offline&amp;amp;prompt=consent" \
  --token-uri "https://oauth2.googleapis.com/token" \
  --scope "https://www.googleapis.com/auth/drive" \
  --scope "https://www.googleapis.com/auth/presentations"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command links the slides-translator-auth ID (referenced in the Python code above as configs.TOKEN_CACHE_KEY) with the actual Client ID, Secret, and the required scopes.&lt;/p&gt;

&lt;p&gt;Note: As the Gemini Enterprise exposition API is still in private preview, I can’t share more details nor the deployment script yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Seamless Exposition
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/gemini/enterprise?authuser=1" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt; gives you a quick way to expose your agent securely and conveniently. This directly addresses the “&lt;a href="https://fmind.medium.com/the-real-ai-agent-bottleneck-is-the-damn-ui-90e90ee369e0" rel="noopener noreferrer"&gt;UI bottleneck&lt;/a&gt;” mentioned earlier.&lt;/p&gt;

&lt;p&gt;This approach has significant advantages over deploying a separate UI (like Streamlit):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-Effort UI:&lt;/strong&gt; No need to design, host, or secure a separate frontend application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Thanks to the underlying Agent Engine, it traces and logs the agent information automatically, providing essential observability for production monitoring and debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Services:&lt;/strong&gt; It provides more core services and integrates seamlessly within the Google Cloud security perimeter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The end result is a clean, integrated experience. Users can interact with the “Slides Translator Agent” directly within the Gemini interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AayV7S1xxgRSUraWii75ggA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AayV7S1xxgRSUraWii75ggA.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Slides Translator Agent deployed on Gemini Enterprise&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This journey from a simple notebook to a production-ready agent was a great experience to see what this stack provides out of the box. The combination of ADK for development, OAuth for security, and Gemini Enterprise for deployment streamlines the entire lifecycle of an enterprise agent, allowing us to deploy quickly without compromising on security or usability.&lt;/p&gt;

&lt;p&gt;I’m eager to explore more ways to build agents. While this is a new paradigm that requires upskilling our teammates and adapting our development practices, we already see the potential from the use cases we see. The ability to rapidly deploy secure, specialized tools that act on behalf of users is a significant step forward.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Acd0pXOH1MIMOs6fkqEyU3A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2Acd0pXOH1MIMOs6fkqEyU3A.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Human and Agent merging to accomplish their tasks (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>artificialintelligen</category>
      <category>generativeaitools</category>
      <category>agents</category>
      <category>googlecloudplatform</category>
    </item>
    <item>
      <title>The Real AI Agent Bottleneck is the Damn UI</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Sun, 12 Oct 2025 16:12:41 +0000</pubDate>
      <link>https://forem.com/fmind/the-real-ai-agent-bottleneck-is-the-damn-ui-5277</link>
      <guid>https://forem.com/fmind/the-real-ai-agent-bottleneck-is-the-damn-ui-5277</guid>
      <description>&lt;p&gt;We’re living in the golden age of AI agent development. The backend infrastructure is finally catching up to the hype. If you’ve followed my previous work on &lt;a href="https://fmind.medium.com/deploying-ai-agents-in-the-enterprise-using-adk-and-google-cloud-b49e7eda3b41" rel="noopener noreferrer"&gt;deploying agents using ADK and Google Cloud&lt;/a&gt;, you know that the heavy lifting — the orchestration, the tool integration, the deployment pipelines — is becoming standardized.&lt;/p&gt;

&lt;p&gt;The major players are all in. Whether you’re using Google Cloud’s &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview" rel="noopener noreferrer"&gt;Vertex AI Agent Engine&lt;/a&gt; powered by the &lt;a href="https://github.com/google/adk-python" rel="noopener noreferrer"&gt;ADK&lt;/a&gt;, &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;AWS AgentCore&lt;/a&gt; with &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-frameworks/strands-agents.html" rel="noopener noreferrer"&gt;Strands&lt;/a&gt;, or &lt;a href="https://docs.databricks.com/aws/en/generative-ai/agent-bricks/" rel="noopener noreferrer"&gt;Databricks’ AgentBricks&lt;/a&gt;, building the &lt;em&gt;brain&lt;/em&gt; of the agent is easier than ever. But here’s the dirty secret the hype cycle isn’t talking about: &lt;strong&gt;The User Interface (UI) is the real bottleneck for industrializing AI agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can have the most sophisticated, multi-step reasoning agent on the planet, but if your users can’t interact with it intuitively, securely, and effectively, it’s a challenge for deploying AI agents at scale. The last mile — exposing the agent to maximize impact — is where projects go to die. In this article, we are going to explore this problem, and find the best trade-off to remove the bottlenecks and adopt AI agents full-throttle in your company!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvzmt03ex6exmwtvwdad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvzmt03ex6exmwtvwdad.png" width="800" height="457"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Google AI Studio&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottleneck: Why UI is the Hardest Part
&lt;/h3&gt;

&lt;p&gt;Building an agent requires a specific skill set: LLM understanding, backend engineering, and prompt whispering. Building a &lt;em&gt;good&lt;/em&gt; UI requires a completely different one: frontend development, UX design, and product sense. The engineers hacking together these agents are rarely UI experts. And frankly, they shouldn’t have to be.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Tiresome Process of Building UIs
&lt;/h4&gt;

&lt;p&gt;Having to spin up a new React app every time you deploy an agent is soul-crushing. It’s tedious, time-consuming, and completely unscalable. We need generalized interfaces that adapt to specific workflows, not custom code for every use case. This includes generalizing how we evaluate agent performance and collect user feedback — critical components that are often considered as an afterthought.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Identity Crisis
&lt;/h4&gt;

&lt;p&gt;User and Agent Identity is paramount. If an agent needs to access a database or pull a file from Google Drive, it must do so &lt;em&gt;on the user’s behalf,&lt;/em&gt; from the UI . We can’t have agents authenticating with god-mode service accounts, nor can we force users to re-authenticate with every single tool during an interaction. The UI must seamlessly handle delegated authority.&lt;/p&gt;

&lt;h4&gt;
  
  
  Security and Governance: The Enterprise Non-Negotiables
&lt;/h4&gt;

&lt;p&gt;This isn’t a weekend hackathon project. In an enterprise setting, security is everything. You cannot allow loose access controls. The nightmare scenario? An agent with access to your entire data lake &lt;em&gt;and&lt;/em&gt; the ability to send emails externally. The risk of data leakage is massive.&lt;/p&gt;

&lt;p&gt;Governance requires auditing every operation, ensuring data usage is controlled, and verifying that tool access is restricted. The UI is the gateway for all of this. This technical juggle requires both admins and users to be familiar with the environment, but their interfaces must be optimized for their roles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Symptoms of a Broken System
&lt;/h3&gt;

&lt;p&gt;When the UI layer fails, the organization feels the pain.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Rise of Shadow IT:&lt;/strong&gt; When official tools are too hard to use or deploy, users find workarounds. We see a proliferation of quick-and-dirty solutions, like rogue &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; instances deployed under someone’s desk, creating massive security vulnerabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Silos:&lt;/strong&gt; Agents should be collaborative. They need to interact with each other, leveraging protocols like the emerging &lt;a href="https://a2a-protocol.org/" rel="noopener noreferrer"&gt;A2A (Agent-to-Agent) standard&lt;/a&gt;. But when agents live in isolated, collaboration is impossible. They become siloed tools rather than a cohesive intelligence layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The “90% Done” Fallacy:&lt;/strong&gt; This is the classic trap. You hack together a &lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; web app, deploy it, and declare the project 90% complete. Wrong. The real project — adoption, integration, security hardening, and UI refinement — is just beginning.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Exploring the Approaches: The Good, The Bad, and The Ugly
&lt;/h3&gt;

&lt;p&gt;How are we currently trying to solve this UI challenge? Let’s break down the dominant paradigms.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. The Pure Chatbot (The Terminal Approach)
&lt;/h4&gt;

&lt;p&gt;The idea here is that the chatbot is the &lt;em&gt;only&lt;/em&gt; interface. We see this clearly with the recent &lt;a href="https://openai.com/index/introducing-apps-in-chatgpt/" rel="noopener noreferrer"&gt;OpenAI Apps&lt;/a&gt; or &lt;a href="https://support.google.com/gemini/answer/14959807?hl=en&amp;amp;co=GENIE.Platform%3DAndroid" rel="noopener noreferrer"&gt;Gemini Extensions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AoeUeNTKBlnxDyYr2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AoeUeNTKBlnxDyYr2" width="1024" height="576"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;OpenAI Apps. Source: &lt;a href="https://www.axios.com/2025/10/06/openai-chatgpt-app-devday" rel="noopener noreferrer"&gt;https://www.axios.com/2025/10/06/openai-chatgpt-app-devday&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Simple, universal interface for everything. Low development overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Incredibly restrictive. Markdown syntax is &lt;em&gt;not&lt;/em&gt; a UI framework. You can’t easily implement sliders, interactive maps, complex data visualizations, or rich editing tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Verdict:&lt;/strong&gt; This is like a developer terminal, but using natural language instead of Linux commands. It’s powerful for certain tasks but hits a wall quickly when complexity increases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. The Co-Pilot (The Sidecar Approach)
&lt;/h4&gt;

&lt;p&gt;The chatbot controls another, existing interface. The prime example is &lt;a href="https://workspace.google.com/solutions/ai/" rel="noopener noreferrer"&gt;Gemini for Workspace&lt;/a&gt;, where the chatbot sits as a widget on the right side of Docs, Sheets, or Gmail.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fln7fghlj01gt2j0d4pol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fln7fghlj01gt2j0d4pol.png" width="800" height="397"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Gemini for Workspace with Chat Sidecar on Google Sheets&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Meets the user where they already are. Keeps the familiar interface of the host application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Limited to the capabilities of the host app. Cross-application workflows (e.g., “Analyze this spreadsheet and draft a presentation based on the findings”) are difficult or impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Verdict:&lt;/strong&gt; A great enhancement for existing tools, but not a solution for complex, multi-tool agentic workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Generic Static UI (The Visual Workflow)
&lt;/h4&gt;

&lt;p&gt;This involves using a predefined visual interface, like &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; or specialized agent builders.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipqt4kej3ik973v5t6wy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fipqt4kej3ik973v5t6wy.png" width="800" height="546"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Example of Visual Workflow on n8n. Source: &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;https://n8n.io/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Generic yet adaptable to specific workflows. Fast to develop and easy to interpret.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Visual workflows are often legacy techniques poorly suited for Generative AI. They are too rigid. How do you easily put a human in the loop? How do you give the agent more autonomy when the path is predefined?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Verdict:&lt;/strong&gt; Good for traditional automation, but stifles the potential of true AI agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Specific Static UI (The Artisanal Approach)
&lt;/h4&gt;

&lt;p&gt;Building a custom, bespoke UI for every agent, often using frameworks like &lt;a href="https://genkit.dev/" rel="noopener noreferrer"&gt;Genkit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzfakqwth9qyqds429qs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzfakqwth9qyqds429qs.png" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: &lt;a href="https://developers.googleblog.com/en/how-firebase-genkit-helped-add-ai-to-our-compass-app/" rel="noopener noreferrer"&gt;https://developers.googleblog.com/en/how-firebase-genkit-helped-add-ai-to-our-compass-app/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; The absolute best adaptation to the specific use case. Maximum control over the user experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Slow to develop, expensive, and completely unscalable, especially for quick agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Verdict:&lt;/strong&gt; Necessary for flagship products, but impossible for the rapid deployment of specialized agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. Dynamic UI (The Shape-Shifter)
&lt;/h4&gt;

&lt;p&gt;The UI is fluid and generated on the spot by the AI itself. We see this with Claude generating artifacts, or experimental concepts like Google’s &lt;a href="https://opal.withgoogle.com/landing/" rel="noopener noreferrer"&gt;Opal&lt;/a&gt; and the &lt;a href="https://github.com/ag-ui-protocol/ag-ui" rel="noopener noreferrer"&gt;AG-UI&lt;/a&gt; protocol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55e47bihhiipvzxcqbbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55e47bihhiipvzxcqbbe.png" width="800" height="504"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Dynamic UI with Opal (green block). Source: &lt;a href="https://blog.google/technology/google-labs/opal-expansion/" rel="noopener noreferrer"&gt;https://blog.google/technology/google-labs/opal-expansion/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; No need to code UI anymore. Maximum adaptation to the desired workflow. Incredibly fast development cycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Unpredictable and inconsistent. Not efficient — it feels like “vibe coding.” It’s feasible for small apps, but is it robust enough for large-scale enterprise applications?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Verdict:&lt;/strong&gt; The holy grail, but the technology isn’t mature enough for mission-critical applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmclwccfagenmf6zzdmm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmclwccfagenmf6zzdmm.png" width="800" height="367"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Summary of the UI Approaches for AI Agents&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Agent Hub Imperative
&lt;/h3&gt;

&lt;p&gt;Regardless of the UI paradigm we choose, one thing is clear: we need an &lt;strong&gt;Agent Hub&lt;/strong&gt;. Organizations need a centralized location to discover available agents, manage their access, orchestrate their interactions (both human-to-agent and agent-to-agent), and provide governance oversight.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Current Landscape: Evaluating the Options
&lt;/h3&gt;

&lt;p&gt;Where do today’s solutions fit in?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;/&lt;/strong&gt; &lt;a href="https://openai.com/index/introducing-agentkit/" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenAI Agent Builder&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Visual Workflow):&lt;/strong&gt; Familiar with organizations, which aids adoption. However, they are fundamentally restrictive and don’t allow for the autonomy and human-in-the-loop interaction that GenAI agents can leverage.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openai.com/index/introducing-apps-in-chatgpt/" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenAI Apps&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;/&lt;/strong&gt; &lt;a href="https://support.google.com/gemini/answer/13695044?hl=en&amp;amp;co=GENIE.Platform%3DAndroid" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini Extensions&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Chat-First):&lt;/strong&gt; The easy fix, but they lack expressiveness. If we limit agents to simple chat interfaces, we risk repeating the failures of Alexa — useful for timers, but not for complex work.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.google/technology/google-labs/opal-expansion/" rel="noopener noreferrer"&gt;&lt;strong&gt;Opal&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;/&lt;/strong&gt; &lt;a href="https://github.com/ag-ui-protocol/ag-ui" rel="noopener noreferrer"&gt;&lt;strong&gt;AG-UI&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Dynamic UI):&lt;/strong&gt; Great for small, isolated apps and user autonomy, but not scalable for large, complex systems. They are hard to edit, maintain, and ensure consistency.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/fr/blogs/aws/reimagine-the-way-you-work-with-ai-agents-in-amazon-quick-suite/" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS QuickSuite&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Hybrid):&lt;/strong&gt; A pragmatic, conservative middle ground. &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/what-is.html" rel="noopener noreferrer"&gt;QuickSuite&lt;/a&gt; offers a toolset of GenAI variants with UIs tailored for specific tasks like data analysis, deep research, or conversation. A solid, choice, especially if you are using AWS services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz56u4ofkh729xml7bm07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz56u4ofkh729xml7bm07.png" width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AWS Quic Suite with several experiences: Chat Agents, Flows, and Research&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/gemini-enterprise?hl=en" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini Enterprise&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Agent Hub Focus):&lt;/strong&gt; &lt;a href="https://cloud.google.com/gemini-enterprise" rel="noopener noreferrer"&gt;Gemini Enterprise&lt;/a&gt; shows potential as a central hub, but it needs to deliver richer expressiveness beyond the standard chat interface to truly unlock agent potential. One solution is to control other UI (e.g., Google Sheets) from the chat app.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2APOyQWrG5DLVE3VevhZCeog.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2APOyQWrG5DLVE3VevhZCeog.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Agent Gallery and Usage from Gemini Enterprise. Source: &lt;a href="https://cloud.google.com/gemini-enterprise?hl=en" rel="noopener noreferrer"&gt;https://cloud.google.com/gemini-enterprise?hl=en&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  My Bet on the Future
&lt;/h3&gt;

&lt;p&gt;The UI bottleneck won’t be solved overnight. Here’s where I see things heading.&lt;/p&gt;

&lt;h4&gt;
  
  
  Short/Medium Term: The “Hacker Terminal” Wins
&lt;/h4&gt;

&lt;p&gt;For the immediate future, the &lt;strong&gt;Chatbot UI&lt;/strong&gt; will dominate. It’s the easiest to develop and gets you 80% of the way there. It’s the “hacker terminal” approach — using natural language to orchestrate complex systems — but easier to use. In addition, visual workflows will be used for deterministic applications (i.e., &lt;a href="https://www.youtube.com/watch?v=Qd6anWv0mv0" rel="noopener noreferrer"&gt;agentic workflows&lt;/a&gt;) as a complementary solution.&lt;/p&gt;

&lt;p&gt;The key to making this work won’t be richer UIs, but better &lt;em&gt;backend&lt;/em&gt; collaboration. Agents need to be able to seamlessly call other agents (&lt;a href="https://a2a-protocol.org/" rel="noopener noreferrer"&gt;A2A&lt;/a&gt;) behind the scenes, using the chat interface purely as the command and control layer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Long Term: Ambient Computing and Voice
&lt;/h4&gt;

&lt;p&gt;In the long term, the best UI is no UI. We will move towards &lt;strong&gt;voice and ambient computing&lt;/strong&gt;. We will keep our existing human applications (our spreadsheets, our design tools, our CRMs), and agents will pilot them intelligently on our behalf.&lt;/p&gt;

&lt;p&gt;This is both easier to develop (no new UIs needed) and easier to adopt (users keep their existing workflows). However, this requires incredibly robust models and rigorous testing. We only adopt transformative interfaces when they are near-perfect. Think about voice translation — it only became truly useful when it crossed the 95% accuracy threshold. Ambient computing will require the same level of reliability.&lt;/p&gt;

&lt;p&gt;Until then, we need to stop treating the UI as an afterthought. It’s a critical component for unlocking the value of AI agents in the enterprise. It’s time we started engineering it with the same rigor we apply to the agents themselves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2ApF9052Gy-cRPBuUS" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2ApF9052Gy-cRPBuUS" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Photo by Anton Filatov on Unsplash&lt;/em&gt;&lt;/p&gt;

</description>
      <category>generativeaitools</category>
      <category>userexperience</category>
      <category>datascience</category>
      <category>agenticai</category>
    </item>
    <item>
      <title>Da2a: The Future of Data Platforms is Agentic, Distributed, and Collaborative</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Sat, 27 Sep 2025 13:36:40 +0000</pubDate>
      <link>https://forem.com/fmind/da2a-the-future-of-data-platforms-is-agentic-distributed-and-collaborative-4ikd</link>
      <guid>https://forem.com/fmind/da2a-the-future-of-data-platforms-is-agentic-distributed-and-collaborative-4ikd</guid>
      <description>&lt;p&gt;For decades, the story of data platforms has been one of centralization and heavy engineering. We built massive data warehouses and data lakes, but accessing their insights required deep technical expertise. Business users couldn’t simply ask questions; they had to navigate a complex process involving specialized data engineers to build painstaking ETL pipelines, optimized queries, and specific dashboards. This highly technical approach created a rigid, monolithic source of truth that, while powerful, was slow to adapt and created significant bottlenecks. It left decision-makers waiting days or even weeks for answers, completely dependent on an over-burdened engineering team.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fileeo7z7i8h7x929dez3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fileeo7z7i8h7x929dez3.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Illustration of the complexity of data platforms (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What if we flipped the model on its head?&lt;/p&gt;

&lt;p&gt;Instead of a single, all-knowing monolith, imagine a collaborative ecosystem where domain experts describe their data in natural language, providing context that empowers a network of intelligent, autonomous agents. Each agent becomes an expert in its domain — sales, marketing, logistics, finance — managing its own data by combining human-provided descriptions with its own skills to answer questions. This is the future of data platforms: a system that is &lt;strong&gt;agentic, distributed, and truly collaborative&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I created a new open-source project, &lt;a href="https://github.com/fmind/da2a" rel="noopener noreferrer"&gt;&lt;strong&gt;Da2a&lt;/strong&gt;&lt;/a&gt;, to explore this paradigm. It’s a prototype that demonstrates how a multi-agent system can tackle complex data analysis by working together.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Old Way vs. The New Paradigm
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The traditional data platform is&lt;/strong&gt;  &lt;strong&gt;engineering-focused&lt;/strong&gt;. The primary challenge is moving, storing, and modeling data. Answering a simple business question like, &lt;em&gt;“What’s the ROI on our latest social media campaign?”&lt;/em&gt; could involve:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Filing a ticket with the data engineering team.&lt;/li&gt;
&lt;li&gt;Waiting for them to build a new pipeline to join marketing spend data with sales data.&lt;/li&gt;
&lt;li&gt;Having an analyst write a complex SQL query across multiple massive tables.&lt;/li&gt;
&lt;li&gt;Finally, getting a report back, hoping the initial question hasn’t become irrelevant.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The agentic approach is&lt;/strong&gt;  &lt;strong&gt;insight-focused&lt;/strong&gt;. Instead of a centralized database, you have specialized agents. For instance, the &lt;strong&gt;Marketing Agent&lt;/strong&gt; knows everything about campaign spending and lead acquisition. On the other hand, the &lt;strong&gt;E-commerce Agent&lt;/strong&gt; is an expert on orders, products, and revenue.&lt;/p&gt;

&lt;p&gt;To answer that same question, you simply ask a root “Orchestrator Agent.” The orchestrator understands the goal, formulates a plan, and collaborates with the specialist agents to get the answer. The focus shifts from the &lt;em&gt;how&lt;/em&gt; (engineering) to the &lt;em&gt;what&lt;/em&gt; (the business question).&lt;/p&gt;
&lt;h3&gt;
  
  
  Meet Da2a: An Agentic Platform in Action
&lt;/h3&gt;

&lt;p&gt;Da2a implements this vision with a root orchestrator and two specialized agents: one for an &lt;strong&gt;e-commerce&lt;/strong&gt; dataset and another for a &lt;strong&gt;marketing&lt;/strong&gt; dataset, both based on real-world data from the &lt;a href="https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce" rel="noopener noreferrer"&gt;Olist store in Brazil&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/fmind/da2a" rel="noopener noreferrer"&gt;https://github.com/fmind/da2a&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Demos:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Root Orchestrator: &lt;a href="https://da2a.fmind.dev/" rel="noopener noreferrer"&gt;https://da2a.fmind.dev/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Marketing Agent: &lt;a href="https://da2a-marketing.fmind.dev/" rel="noopener noreferrer"&gt;https://da2a-marketing.fmind.dev/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;E-commerce Agent: &lt;a href="https://da2a-ecommerce.fmind.dev/" rel="noopener noreferrer"&gt;https://da2a-ecommerce.fmind.dev/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can ask the e-commerce agent, “How many orders were placed in São Paulo?” or the marketing agent, “What were our top lead sources last year?”. Better yet, you can ask the root orchestrator a question that requires both, like, &lt;em&gt;“What is the total sales revenue from sellers who were acquired via ‘Display’ advertising?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllapdgh3q85bw73i691u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllapdgh3q85bw73i691u.png" width="800" height="417"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Screenshot of the Da2a User Interface: &lt;a href="http://da2a.fmind.dev/" rel="noopener noreferrer"&gt;http://da2a.fmind.dev/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The root agent intelligently delegates the work: first asking the marketing agent to identify the sellers from the ‘Display’ channel, then passing that list to the e-commerce agent to calculate their total sales.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Architecture: Collaboration via the A2A Protocol
&lt;/h3&gt;

&lt;p&gt;The magic that makes this collaboration possible is the &lt;a href="https://a2a-protocol.org/latest/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent-to-Agent (A2A) protocol&lt;/strong&gt;&lt;/a&gt;. A2A provides a standardized way for agents to communicate their capabilities and call upon each other’s skills over a network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnk8wn7vjw8tmdmg542l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnk8wn7vjw8tmdmg542l.png" width="800" height="282"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture of the Da2a Application, with Marketing and E-Commerce agents collaborating with the Root Agent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The architecture consists of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://github.com/fmind/da2a/tree/main/da2a" rel="noopener noreferrer"&gt;&lt;strong&gt;A Root Agent&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; The orchestrator that receives user requests, plans the execution, and delegates tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain Agents:&lt;/strong&gt; The &lt;a href="https://github.com/fmind/da2a/tree/main/ecommerce" rel="noopener noreferrer"&gt;ecommerce_agent&lt;/a&gt; and &lt;a href="https://github.com/fmind/da2a/tree/main/marketing" rel="noopener noreferrer"&gt;marketing_agent&lt;/a&gt;, each running as an independent service with its own database.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://a2a-protocol.org/latest/tutorials/python/3-agent-skills-and-card/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Cards&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; Each domain agent exposes a JSON “agent card” that acts like a digital business card, describing its name, capabilities, and how to communicate with it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The root agent is configured to know about these remote agents. Here is a simplified look at the code from the da2a &lt;a href="https://github.com/fmind/da2a/blob/main/da2a/agent.py" rel="noopener noreferrer"&gt;agent.py&lt;/a&gt; file, which sets up the connection to the remote agents using their "agent cards."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import google.adk.agents.remote_a2a_agent as a2a
import google.adk.tools.agent_tool as at

# The URL points to the 'agent card' of the remote agent
AGENT_CARD_ECOMMERCE = "[https://da2a-ecommerce.fmind.dev/a2a/ecommerce/.well-known/agent-card.json](https://da2a-ecommerce.fmind.dev/a2a/ecommerce/.well-known/agent-card.json)"
AGENT_CARD_MARKETING = "[https://da2a-marketing.fmind.dev/a2a/marketing/.well-known/agent-card.json](https://da2a-marketing.fmind.dev/a2a/marketing/.well-known/agent-card.json)"

# Create local proxy objects for the remote agents
ecommerce_agent = a2a.RemoteA2aAgent(
    name="ecommerce_agent",
    agent_card=AGENT_CARD_ECOMMERCE,
    description="Answers questions about e-commerce data..."
)
marketing_agent = a2a.RemoteA2aAgent(
    name="marketing_agent",
    agent_card=AGENT_CARD_MARKETING,
    description="Answers questions about marketing data..."
)

# The root agent uses these agents as 'tools' to solve problems
root_agent = LlmAgent(
    ...
    tools=[at.AgentTool(ecommerce_agent), at.AgentTool(marketing_agent)],
    ...
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each domain agent is served via the &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit’s&lt;/a&gt; (ADK) web server, which automatically exposes the A2A endpoints and the agent card.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Command to serve an agent and enable A2A communication
adk web --a2a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simple, powerful mechanism allows us to build a distributed system where components can be developed, deployed, and scaled independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Benefits of Thinking Agentically
&lt;/h3&gt;

&lt;p&gt;This approach unlocks several powerful advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Human-Like Task Handling:&lt;/strong&gt; Agents can tackle complex, multi-step tasks that require synthesizing information from different domains, much like a human analyst would.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability and Extensibility:&lt;/strong&gt; Adding a new data domain is as simple as building and deploying a new agent. No need to re-architect the entire platform. The system grows organically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Focus on High-Level Value:&lt;/strong&gt; It abstracts the underlying engineering complexity. Data consumers and developers can focus on defining business logic and asking high-level questions, not on writing SQL or managing data pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous and Collaborative:&lt;/strong&gt; Each agent is a valuable tool on its own, but their true power is unlocked when they collaborate through an orchestrator to solve problems that no single agent could handle alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Road Ahead: Limitations and Future Work
&lt;/h3&gt;

&lt;p&gt;Da2a is a prototype, and building an industrial-grade agentic data platform requires solving some interesting challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Data Transfer:&lt;/strong&gt; A2A is excellent for orchestrating tasks and passing small payloads of text or JSON. It is not designed for transferring gigabytes of data between agents. For that, we’d need to integrate mechanisms that point agents to shared data storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Agent Discovery:&lt;/strong&gt; Currently, the root agent’s knowledge of other agents is hardcoded. A production system would need a discovery service or a registry where agents can dynamically register themselves and their skills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory and Learning:&lt;/strong&gt; The agents in this prototype are stateless. The next frontier is to give them memory, allowing them to learn from past interactions, recall previous results, and improve their planning and execution over time.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Conclusion: A New Frontier for Data
&lt;/h3&gt;

&lt;p&gt;The agentic paradigm represents a fundamental shift in how we think about data architecture. We are moving from rigid, centralized systems to dynamic, decentralized ecosystems of intelligent specialists. This approach promises to create data platforms that are more flexible, more powerful, and more aligned with the way businesses actually work.&lt;/p&gt;

&lt;p&gt;There is still much to build, but the potential is immense. The future of data isn’t just about bigger databases or faster queries; it’s about collaboration, intelligence, and a network of agents working together to turn data into insight.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy35cyrmqbjlsu1j9oay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy35cyrmqbjlsu1j9oay.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The future is with Agentic Data Platforms (Source: Gemini App)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>artificialintelligen</category>
      <category>datascience</category>
      <category>generativeaitools</category>
    </item>
    <item>
      <title>Ackgent: Rapid Agent Development on GCP with ADK and Agent Config</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Sun, 14 Sep 2025 13:13:04 +0000</pubDate>
      <link>https://forem.com/fmind/ackgent-rapid-agent-development-on-gcp-with-adk-and-agent-config-lb9</link>
      <guid>https://forem.com/fmind/ackgent-rapid-agent-development-on-gcp-with-adk-and-agent-config-lb9</guid>
      <description>&lt;p&gt;The AI agent landscape is exploding, but development speed is hitting a wall. We need a faster, more accessible way to build and iterate. In my current role, I spend my days optimizing the experience of building and deploying AI agents. I’ve witnessed firsthand the incredible use cases my customers’ developers — agents that streamline complex workflows, automate intricate decision-making, and unlock new data insights. The potential is massive, but the reality of development is often friction-filled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Despite the advancements in foundational models, the process of taking an agent from concept to production remains too slow and overly complex.&lt;/strong&gt; Developers get bogged down in boilerplate code, infrastructure wrangling, and the mechanics of tool integration, rather than focusing on the actual logic and value the agent provides. We drastically need to improve the speed at which we can iterate, while simultaneously making the whole process more accessible to a broader range of builders. Enter &lt;a href="https://github.com/fmind/ackgent" rel="noopener noreferrer"&gt;&lt;strong&gt;Ackgent&lt;/strong&gt;&lt;/a&gt;, a demonstration of how &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Google ADK&lt;/a&gt; and &lt;a href="https://google.github.io/adk-docs/agents/config/" rel="noopener noreferrer"&gt;Agent Config&lt;/a&gt; can be use to quickly build and deploy AI agents with a declarative approach.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfmm1hkft1fxmxkek3vu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfmm1hkft1fxmxkek3vu.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Shift from Imperative to Declarative
&lt;/h3&gt;

&lt;p&gt;The traditional approach to building agents is largely &lt;em&gt;imperative&lt;/em&gt;. You write Python (or similar) code detailing exactly &lt;em&gt;how&lt;/em&gt; the agent should execute tasks, manage state, call tools, and handle errors. This offers maximum control but comes at the cost of speed and simplicity.&lt;/p&gt;

&lt;p&gt;What if we could shift to a &lt;em&gt;declarative&lt;/em&gt; approach? What if we could define &lt;em&gt;what&lt;/em&gt; the agent should do, and let a robust framework handle the execution? This is the promise of &lt;a href="https://google.github.io/adk-docs/agents/config/" rel="noopener noreferrer"&gt;Agent Config&lt;/a&gt;, a new feature of &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Developer Kit (ADK)&lt;/a&gt; introduced in &lt;a href="https://github.com/google/adk-python/releases/tag/v1.12.0" rel="noopener noreferrer"&gt;the release v.1.12.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://google.github.io/adk-docs/agents/config/" rel="noopener noreferrer"&gt;Agent Config&lt;/a&gt; allows developers to define the entire behavior of an agent — its goals, instructions, tools, and integrations — using a structured configuration file written in YAML. This addresses both the need for speed and the need for accessibility.&lt;/p&gt;

&lt;p&gt;The central insight here is that &lt;strong&gt;config helps you focus on the use case, not the code.&lt;/strong&gt; By abstracting the underlying mechanics, developers, prompt engineers, and product managers can rapidly prototype and test different agent behaviors simply by editing a YAML file.&lt;/p&gt;
&lt;h3&gt;
  
  
  Flexibility Without the Boilerplate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Crucially, adopting a declarative approach doesn’t mean sacrificing power or flexibility&lt;/strong&gt;. Agent Config is designed to be extensible. While the core orchestration is handled by the framework, it provides clear pathways for integrating essential components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;External Tools:&lt;/strong&gt; You can easily connect your agents to real-world APIs, databases, and services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Callbacks:&lt;/strong&gt; Hooks are available to inject custom Python logic at specific points in the agent lifecycle (e.g., for pre-processing input, validating output, logging, or monitoring).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP (Multi-agent Communication Protocol) Servers:&lt;/strong&gt; Agent Config supports integration with MCP servers, enabling sophisticated communication, governance, and orchestration in complex multi-agent systems.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
agent_class: LlmAgent
model: gemini-2.5-flash
name: prime_agent
description: Handles checking if numbers are prime.
instruction: |
  You are responsible for checking whether numbers are prime.
  When asked to check primes, you must call the check_prime tool with a list of integers.
  Never attempt to determine prime numbers manually.
  Return the prime number results to the root agent.
tools:
  - name: ma_llm.check_prime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Introducing Ackgent: The Agent Config Starter Kit
&lt;/h3&gt;

&lt;p&gt;To help teams adopt this powerful paradigm, I’ve created a new GitHub repository: &lt;a href="https://github.com/fmind/ackgent" rel="noopener noreferrer"&gt;&lt;strong&gt;Ackgent&lt;/strong&gt;&lt;/a&gt;. This repository a demonstration on how to leverage ADK Agent Config within a modern, production-ready Python environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wuqhs3j864wgqxhm613.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wuqhs3j864wgqxhm613.png" width="800" height="422"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Web Inteface of Ackgent with the Internet and Datetime agents: &lt;a href="https://ackgent.fmind.dev/" rel="noopener noreferrer"&gt;https://ackgent.fmind.dev/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This template encapsulates best practices for structuring a project where configuration is the core, supported by a suite of modern development tools.&lt;/p&gt;
&lt;h4&gt;
  
  
  Repository Features: A Modern Stack
&lt;/h4&gt;

&lt;p&gt;The Ackgent repository is built with efficiency, robustness, and Developer Experience (DX) in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modern Python Management with&lt;/strong&gt; &lt;a href="https://www.google.com/search?q=%5Bhttps://github.com/astral-sh/uv%5D(https://github.com/astral-sh/uv)&amp;amp;authuser=1" rel="noopener noreferrer"&gt;&lt;strong&gt;uv&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; We leverage uv (the blazing-fast Python package manager written in Rust) to streamline dependency resolution and virtual environment management, significantly speeding up setup and CI/CD pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task Execution with&lt;/strong&gt; &lt;a href="https://www.google.com/search?q=%5Bhttps://github.com/casey/just%5D(https://github.com/casey/just)&amp;amp;authuser=1" rel="noopener noreferrer"&gt;&lt;strong&gt;just&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; just serves as a convenient command runner, simplifying common tasks like installing dependencies with just project or deploying to the cloud with just deploy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Quality Tooling:&lt;/strong&gt; Integrated &lt;a href="https://www.google.com/search?q=%5Bhttps://pre-commit.com/%5D(https://pre-commit.com/)&amp;amp;authuser=1" rel="noopener noreferrer"&gt;pre-commit&lt;/a&gt; hooks ensure code quality and consistency from the start, utilizing formatters and linters like check-toml , check-yaml , or check-json .&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;evalset Configuration:&lt;/strong&gt; The repository includes ADK capability for defining and running evaluation datasets (&lt;a href="https://google.github.io/adk-docs/evaluate/" rel="noopener noreferrer"&gt;evalset&lt;/a&gt;), crucial for rigorously testing and benchmarking agent performance iteratively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customizable Cloud Run Deployment:&lt;/strong&gt; Designed for scalability, the example includes configurations and just recipes for deploying the agents as serverless containers on &lt;a href="https://cloud.google.com/run?authuser=1" rel="noopener noreferrer"&gt;Google Cloud Run&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Agent Capabilities:&lt;/strong&gt; Demonstrations of &lt;strong&gt;Tools&lt;/strong&gt; , &lt;strong&gt;Callbacks&lt;/strong&gt; , and &lt;strong&gt;MCP&lt;/strong&gt; integration within the Agent Config framework.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffnux6ms1yu3q2bpp5n80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffnux6ms1yu3q2bpp5n80.png" width="800" height="420"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Ackgent on Cloud Run gives you access to key metrics, logs, SLOs and Error to better observe your agent in production&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;The Ackgent example utilizes a modular architecture centered around the ADK framework. The core concept is the separation of concerns: the agent behavior (the “what”) is defined in YAML files, while the implementations (the “how” — tools, callbacks, and MCP connections) are written in Python.&lt;/p&gt;

&lt;p&gt;The ADK framework acts as the runtime engine. It parses the Agent Config YAML, initializes the specified LLM, and orchestrates the flow of conversation. When a request is received (e.g., via the Cloud Run endpoint), the runtime identifies the target agent. When the LLM decides to use a tool or delegate to another agent, ADK handles the execution via the implementation references provided in the configuration.&lt;/p&gt;

&lt;p&gt;This separation of concerns — behavior in YAML, execution handled by ADK, and specialized logic in Python — is what enables rapid iteration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkco8itp0yij9iwywdyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkco8itp0yij9iwywdyp.png" width="800" height="1036"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture of the Ackgent example with 3 Agents: Root (Dispatcher), Datetime (Tools), and Internet (MCP)&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Under the Hood: Defining Agents with YAML
&lt;/h3&gt;

&lt;p&gt;Let’s look at how this works in practice. The Ackgent repository showcases three distinct agents, demonstrating the core capabilities of the Agent Config approach. Notice how minimal the Python code is, focusing mainly on tool implementation, while the behavior is entirely in YAML.&lt;/p&gt;
&lt;h4&gt;
  
  
  1. The Datetime Agent (Custom Tools)
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://github.com/fmind/ackgent/blob/main/agent/datetime_agent.yaml" rel="noopener noreferrer"&gt;datetime agent&lt;/a&gt; demonstrates how to extend an agent with external tools tools. The agent can access the current date and time defined in the &lt;a href="https://github.com/fmind/ackgent/blob/main/agent/tools.py" rel="noopener noreferrer"&gt;tools.py&lt;/a&gt; of the repository, which are defined as simple functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: datetime_agent
model: gemini-2.5-flash
description: A helpful assistant for datetime questions.
instruction: Return the current date or time based on the user's request.
generate_content_config:
  temperature: 0.0
tools:
  - name: agent.tools.now
  - name: agent.tools.today

"""Tools for agents."""

# %% IMPORTS

import datetime

# %% TOOLS

def now() -&amp;gt; str:
    """Returns the current time.

    Returns:
        str: The current time in 'HH:MM' format.
    """
    return datetime.datetime.now().strftime("%H:%M")

def today() -&amp;gt; str:
    """Returns the current date.

    Returns:
        str: The current date in 'YYYY-MM-DD' format.
    """
    return str(datetime.date.today())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. The Internet Agent (Search Tools)
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://github.com/fmind/ackgent/blob/main/agent/internet_agent.yaml" rel="noopener noreferrer"&gt;Internet agent&lt;/a&gt; is configured to access an external MCP Server. In this case, we are using &lt;a href="https://github.com/microsoft/markitdown?tab=readme-ov-file" rel="noopener noreferrer"&gt;markdown-mcp&lt;/a&gt;, a server developed by Microsoft to quickly retrieve any source into a markdown, including external links. The MCP is started as a STDIO server, with a timeout of 10 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: internet_agent
model: gemini-2.5-flash
description: A helpful assistant for answering questions from the Internet.
instruction: Return the answer to questions using the user provided link.
generate_content_config:
  temperature: 0.0
tools:
- name: MCPToolset
  args:
    stdio_connection_params:
      server_params:
        command: "markitdown-mcp"
      timeout: 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. The Root Agent (Coordination and Routing)
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://github.com/fmind/ackgent/blob/main/agent/root_agent.yaml" rel="noopener noreferrer"&gt;root agent&lt;/a&gt; acts as the main entry point. It doesn't perform tasks itself; instead, its primary function is orchestration. It analyzes the user's intent and intelligently delegates the task to the most appropriate specialized agent using the sub_agents configuration. This pattern enables a scalable and modular multi-agent system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json
name: root_agent
model: gemini-2.5-flash
description: A helpful assistant for user questions.
instruction: |
  You are a helpful assistant that can answer questions about anything.
  Use the following sub-agents to answer questions: `datetime_agent` and `internet_agent`.
generate_content_config:
  temperature: 0.0
after_model_callbacks:
  - name: agent.callbacks.after_model_callback
sub_agents:
  - config_path: datetime_agent.yaml
  - config_path: internet_agent.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Current Limitations
&lt;/h3&gt;

&lt;p&gt;While Agent Config is a great helper for quickly building agents, it’s important to be aware of the current constraints within the ADK framework as it evolves.&lt;/p&gt;

&lt;p&gt;One notable limitation today involves mixing different types of capabilities within a single agent definition. Currently, you cannot configure an agent that simultaneously uses “Built-In Search Tools” with google_search or VertexAiSearchTool alongside "Non-Search Tools" (like the custom Python functions) or "Sub-Agents" (like the root agent uses).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools: # multiple tools are supported only when they are all search tools!
  - name: google_search
  - name: VertexAiSearchTool
   args:
     data_store_id: "projects/ackgent/locations/us/collections/default_collection/dataStores/reports_123..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ADK team is actively working on enhancing this flexibility. For now, the recommended architecture — as demonstrated in the Ackgent repository — is either to separate concerns into specialized agents, or create custom search tools (e.g., like the markitdown-mcp server).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future: Democratizing Agent Creation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;ADK Agent Config is more than just a feature; it’s a foundational shift in agent development from imperative to declarative.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Based on Agent Config, &lt;a href="https://github.com/fmind/ackgent" rel="noopener noreferrer"&gt;Ackgent&lt;/a&gt; offers an immediate boost in productivity. It streamlines the development lifecycle, reduces boilerplate, and makes testing and deployment significantly faster. This template repository provides a concrete starting point to leverage these benefits today.&lt;/p&gt;

&lt;p&gt;But the long-term vision is even more exciting. Because the agent’s behavior is defined declaratively in a structured, human-readable format (YAML), it opens the door for non-technical users — what we might call “digital users” — to build their own agents. Imagine a future where a UI allows business analysts or domain experts to visually construct complex agents by defining instructions and plugging in tools — all powered by Agent Config under the hood.&lt;/p&gt;

&lt;p&gt;We are moving towards a future where the ability to create AI agents is truly democratized. I encourage you to explore the repository, try out the examples, and experience the speed and simplicity of declarative agent development.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Link to GitHub Repository&lt;/strong&gt; : &lt;a href="https://github.com/fmind/ackgent" rel="noopener noreferrer"&gt;https://github.com/fmind/ackgent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Link to Web Demo&lt;/strong&gt; : &lt;a href="https://ackgent.fmind.dev/" rel="noopener noreferrer"&gt;https://ackgent.fmind.dev/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oqcikt0hqf2w7xcqyrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oqcikt0hqf2w7xcqyrh.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Gemini App&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>generativeaitools</category>
      <category>googleadk</category>
      <category>agents</category>
    </item>
    <item>
      <title>Combo-Banana: Building Custom Image Workflows in Record Time</title>
      <dc:creator>Médéric Hurier (Fmind)</dc:creator>
      <pubDate>Mon, 08 Sep 2025 19:03:39 +0000</pubDate>
      <link>https://forem.com/fmind/combo-banana-building-custom-image-workflows-in-record-time-4cl0</link>
      <guid>https://forem.com/fmind/combo-banana-building-custom-image-workflows-in-record-time-4cl0</guid>
      <description>&lt;p&gt;In the fast-paced world of product retail, agility is crucial for the teams bringing products to market. Product designers at my customer handle a massive volume of images daily. Ensuring every product looks perfect across the website, mobile apps, and marketing campaigns often involves tedious, multi-step editing processes — background removal, resizing, color correction, and optimization.&lt;/p&gt;

&lt;p&gt;While essential, these repetitive tasks can consume hours, diverting designers from the creative work they do best. What if designers could automate these specific workflows themselves, without wrestling with complex software or waiting for engineering resources?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbyi3nmo8upy8lt1zdvs0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbyi3nmo8upy8lt1zdvs0.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Nano Banana&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This challenge inspired a recent project: &lt;a href="https://github.com/fmind/combo-banana" rel="noopener noreferrer"&gt;&lt;strong&gt;Combo-Banana&lt;/strong&gt;&lt;/a&gt;. A simple open-source prototype based on &lt;a href="https://blog.google/products/gemini/updated-image-editing-model/" rel="noopener noreferrer"&gt;Google's Nano Banana&lt;/a&gt; designed to demonstrate just how quickly we can build applications that deliver immediate value to our teammates on the field. This project is about empowering designers to create their own multi-step image editing pipelines.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Use Case: Beyond Manual Editing
&lt;/h3&gt;

&lt;p&gt;Imagine a designer preparing images for a new product line. The workflow is predictable but labor-intensive:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receive raw photos from the studio.&lt;/li&gt;
&lt;li&gt;Manually isolate the product from the background.&lt;/li&gt;
&lt;li&gt;Adjust the lighting and contrast to meet brand guidelines.&lt;/li&gt;
&lt;li&gt;Resize and crop for the product detail page (high resolution).&lt;/li&gt;
&lt;li&gt;Integrate the products in several situations (e.g., on a user, in a store).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When done manually across hundreds of SKUs, this process is slow and prone to inconsistencies.&lt;/p&gt;

&lt;p&gt;This prototype reimagines that process. Instead of a series of manual actions across different tools, the designer defines a “combo” — a sequence of operations executed automatically by the application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "name": "Social Media Ad Creation",
    "steps": [
        {
            "title": "Place Item in Landscape",
            "prompt": "Integrate the product or item seamlessly into a visually stunning and appropriate landscape background, ensuring realistic lighting and perspective."
        },
        {
            "title": "Add Catchy Slogan",
            "prompt": "Overlay a concise and catchy slogan onto the image, using a font and placement that enhances readability and visual appeal for a social media ad."
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Experience: Flexibility Meets Simplicity
&lt;/h3&gt;

&lt;p&gt;The prototype focuses on a streamlined experience. A user can upload an image and stack the desired operations. They define the recipe once — e.g., Step 1: Isolate Product; Step 2: Improve the Shadows; Step 3: Add a Slogan — and the application handles the rest.&lt;/p&gt;

&lt;p&gt;This transforms a 15-minute manual task into a 30-second automated process, ensuring pixel-perfect consistency across the entire product catalog and freeing up time for more creative work.&lt;/p&gt;

&lt;h4&gt;
  
  
  See it in Action
&lt;/h4&gt;

&lt;p&gt;The prototype illustrates how an intuitive interface can abstract away the complexity running in the background.&lt;/p&gt;

&lt;p&gt;You can explore the live demo here: &lt;a href="https://www.google.com/search?q=https://combo-banana.fmind.dev/&amp;amp;authuser=1" rel="noopener noreferrer"&gt;&lt;strong&gt;https://combo-banana.fmind.dev/&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8gy3ktps683nhuk49py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8gy3ktps683nhuk49py.png" width="800" height="396"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Combo-Banana: Workflow Definition Tab&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;On the left, the user defines the workflow with a chatbot interface based on &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;. The chatbot extracts prompts into a series of steps that are stacked sequentially. In this example, we start with a “Place the item in a landscape” step, followed by a “Add Catchy Slogan” step, powered by &lt;a href="https://ai.google.dev/gemini-api/docs/image-generation" rel="noopener noreferrer"&gt;Nano Banana&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjf2sjpuv67gbtei9y3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjf2sjpuv67gbtei9y3u.png" width="800" height="396"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Combo-Banana: Workflow Definition Tab&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once the desired “combo” is configured, the user simply uploads the source image on the top left side of the second tab. The application processes the image through the defined pipeline — the output of the first step becomes the input for the next. The final result is displayed on the right, ready for download. This visual feedback loop allows designers to quickly iterate on their workflows before applying them to large batches of images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj45lhnnc8aamsuvzruud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj45lhnnc8aamsuvzruud.png" width="800" height="801"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Final Result of the User Combo&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Under the Hood: The Tech Stack
&lt;/h3&gt;

&lt;p&gt;The speed of development was thanks to a modern, efficient tech stack. We focused on rapid prototyping, leveraging powerful AI, and ensuring scalability:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mceeg1cp6px30xu8jbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mceeg1cp6px30xu8jbb.png" width="800" height="808"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture of Combo-Banana&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Interface:&lt;/strong&gt; &lt;a href="https://gradio.app/" rel="noopener noreferrer"&gt;&lt;strong&gt;Gradio&lt;/strong&gt;&lt;/a&gt; Used to build the interactive web UI entirely in Python, avoiding the need for complex front-end development and significantly speeding up iteration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Backend:&lt;/strong&gt; &lt;a href="https://www.python.org/" rel="noopener noreferrer"&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/a&gt; The backbone of the application, handling core logic and orchestrating the sequence of image processing steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Engine:&lt;/strong&gt; &lt;a href="https://ai.google.dev/gemini-api/docs/image-generation" rel="noopener noreferrer"&gt;&lt;strong&gt;Nano Banana&lt;/strong&gt;&lt;/a&gt; The AI powerhouse driving complex tasks like high-fidelity background removal and segmentation. This project was a fantastic opportunity to leverage its impressive capabilities. In future releases, other models could with combined with Nano-Banana.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; &lt;a href="https://cloud.google.com/run?authuser=1" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Run&lt;/strong&gt;&lt;/a&gt; A serverless platform ensuring the tool is accessible, cost-effective (scales to zero), and scalable on demand within an organization’s infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Road Ahead: From Prototype to Platform
&lt;/h3&gt;

&lt;p&gt;This prototype is just the beginning. The goal is to evolve it into a robust platform that can handle the complexity of real-world production environments. Key opportunities for evolution include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Workflows (DAGs):&lt;/strong&gt; Moving beyond simple sequential pipelines (Step A -&amp;gt; Step B -&amp;gt; Step C) to support Directed Acyclic Graphs (DAGs). This would allow for parallel processing — for example, generating five different resolutions simultaneously after the background has been removed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Granular Configuration:&lt;/strong&gt; Providing deeper configuration options within each processing block (e.g., setting specific compression levels, defining padding for auto-crops, or choosing different AI models for specific tasks and which previous image to use).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem Integration:&lt;/strong&gt; Integrating directly with existing asset management tools. This includes pulling source files from &lt;strong&gt;Google Drive&lt;/strong&gt; and automatically exporting the results to designated folders or downstream systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User Sessions and Workflow Management:&lt;/strong&gt; Implementing user authentication to allow teammates to save, name, share, and reuse their custom workflows, eliminating the need to rebuild them for every session.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Bigger Picture: Bridging the Gap
&lt;/h3&gt;

&lt;p&gt;Building this prototype underscored a critical insight. We are living in a time with access to incredibly powerful technology like Nano Banana. The technology is here, and it works.&lt;/p&gt;

&lt;p&gt;However, the existence of a powerful model is not enough. The key challenge now is to &lt;strong&gt;bridge the gap&lt;/strong&gt; between these technological capabilities and the real-world, day-to-day needs of our colleagues on the field.&lt;/p&gt;

&lt;p&gt;As this project demonstrates, we don’t need massive engineering teams or long development cycles to deliver significant value. By identifying specific pain points and leveraging modern tools like Gradio and Cloud Run, we can rapidly prototype solutions that make a difference.&lt;/p&gt;

&lt;p&gt;This is a phenomenal opportunity for builders and entrepreneurs within any organization. &lt;strong&gt;The tools are ready. It’s time to build!&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Github Repository&lt;/strong&gt; : &lt;a href="https://github.com/fmind/combo-banana" rel="noopener noreferrer"&gt;https://github.com/fmind/combo-banana&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcn0iogwkbjhgqa1132l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcn0iogwkbjhgqa1132l.png" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Combo-Banana&lt;/em&gt;&lt;/p&gt;

</description>
      <category>artificialintelligen</category>
      <category>opensource</category>
      <category>python</category>
      <category>generativeaitools</category>
    </item>
  </channel>
</rss>
