Forem: Saurabh Mishra

Running Agentic AI at Scale on Google Kubernetes Engine

Saurabh Mishra — Wed, 08 Apr 2026 04:15:15 +0000

The AI industry crossed an inflection point. We stopped asking "can the model answer my question?" and started asking "can the system complete my goal?" That shift from inference to agency changes everything about how we build, deploy, and scale AI in the cloud.

Google Kubernetes Engine (GKE) has quietly become the platform of choice for teams running production AI workloads. Its elastic compute, GPU node pools, and rich ecosystem of observability tools make it uniquely suited not just for model serving but for the orchestration challenges that agentic AI introduces.

This blog walks through the full landscape: what kinds of AI systems exist today, how agentic architectures differ, and what it actually looks like to run them reliably on GKE.

The AI Taxonomy: From Reactive to Autonomous

Before diving into infrastructure, it's worth establishing what we mean by the different modes of AI deployment. Not all AI is "agentic," and the architecture you choose should match the behavior you need

Reactive / Inference

Stateless prompt-response. One request, one LLM call, one answer. The model has no memory between turns. Examples: text classifiers, summarizers, one-shot code generators.

Conversational AI

Multi-turn dialog with session state. The model remembers context within a conversation window. Examples: customer support bots, document Q&A, coding assistants.

Retrieval-Augmented (RAG)

The model can query external knowledge at runtime before generating a response. Introduces a retrieval step vector DBs, semantic search, tool calls to databases.

Agentic AI

The model plans, takes actions, observes results, and loops until a goal is reached. It can call tools, spawn subagents, and make decisions across many steps autonomously.

Multi-Agent Systems

A network of specialized agents collaborating: an orchestrator decomposes a task and delegates to researcher, writer, executor agents that work in parallel or sequence.
Each mode up the stack introduces new infrastructure requirements: more state to manage, longer-lived processes, more concurrent workloads, harder failure modes, and deeper observability needs.

Why GKE for AI Workloads?

Kubernetes is table stakes for any modern distributed system. But GKE specifically brings several features that make it exceptional for AI:

GKE Capabilities for AI

GPU and TPU Node Pools

To handle the heavy lifting of Agentic AI, GKE offers specialized Accelerator Node Pools. This infrastructure allows you to dynamically attach high-end compute resources such as NVIDIA A100, H100, or L4 GPUs and Google TPUs exactly when your agents need them.

Workload Identity & Secret Management

Agentic systems touch many external APIs (databases, external services, third-party tools). Workload Identity Federation lets pods authenticate to Google Cloud services without storing long-lived credentials.

Horizontal Pod Autoscaling with Custom Metrics

Scale agent runner replicas based on queue depth (Pub/Sub backlog, Redis list length) rather than CPU. This allows demand-driven scaling that matches agent workload patterns precisely.

GKE Autopilot & Standard Modes

Autopilot mode handles node management entirely, ideal for teams wanting to focus on agent logic. Standard mode gives full control when you need custom kernel modules or specialized hardware affinity rules.

Cloud Run on GKE for Burst Workloads

Short-lived tool execution steps in an agent pipeline can be offloaded to Cloud Run, which scales to zero between invocations avoiding the overhead of always-on Kubernetes pods for infrequent task

Anatomy of an Agentic AI System

An agentic AI system isn't a single process ,it's a distributed workflow. Understanding its components is essential before mapping it onto Kubernetes primitives.
"An agent is an LLM that can observe the world, decide what to do next, and take actions - in a loop, until a goal is satisfied."

Popular Agentic Frameworks on GKE

Several frameworks have emerged to help teams build agentic systems without reinventing the orchestration wheel. Each has a different philosophy and maps to GKE differently.

Agent Development Kit (ADK)

Google's native framework for building multi-agent systems on Vertex AI. First-class GKE support, tight Gemini integration, built-in evaluation tools. Best choice for teams already on Google Cloud.

LangGraph

Graph-based agent orchestration with explicit state machines. Excellent for complex branching workflows. Containerizes cleanly. LangSmith provides tracing that integrates with GKE logging pipelines

CrewAI

Defines agents as role-playing entities (Researcher, Writer, Editor) with goals and backstories. Simple to model complex human workflows. Ideal for content, analysis, and research pipelines.

Google ADK on GKE >> Native Fit

The Google Agent Development Kit (ADK) is architected to treat Kubernetes as its primary "home," creating a seamless integration where the framework and the platform operate as one. Because ADK is built with a Kubernetes-native philosophy, it transforms GKE from a simple hosting environment into a specialized runtime for autonomous systems.

Observability: The Hard Part

Agentic systems fail in non-obvious ways. An agent might produce a response - but the response could be hallucinated, based on a failed tool call, or the result of an unintended plan branch. Standard HTTP error monitoring doesn't catch this.

The recommended observability stack for GKE-based agentic systems:

Observability Stack

OpenTelemetry Instrumentation

Instrument each agent with OpenTelemetry. Emit spans for every LLM call, tool invocation, and planning step. Export to Google Cloud Trace for full distributed trace visualization.

Structured Logging to Cloud Logging

Log each reasoning step as a structured JSON event: task ID, agent ID, step number, prompt hash, tool name, tool result summary, token counts. Query across traces in BigQuery for post-hoc analysis.
Custom Metrics via Cloud Monitoring

Track agent-specific metrics: tasks completed per minute, average steps per task, tool call success rate, LLM latency P50/P95/P99, and hallucination rate from your eval pipeline.

LLM-specific Tracing (LangSmith / Vertex AI Eval)

Leverage LangSmith or Vertex AI's built-in evaluation capabilities to capture complete prompt–response interactions along with semantic quality metrics. These insights can then be fed back into your continuous improvement cycle.

Security Considerations for Agentic AI on GKE

Agents with tool use are a new attack surface. An agent that can execute code, send emails, or write to a database is a powerful actor - and must be treated like one.

Prompt Injection

Malicious content in retrieved documents can instruct the agent to deviate from its goal. Sanitize all retrieved content before insertion into prompts. Use system-level guardrails in your LLM configuration.

Privilege Escalation

Each agent should operate with the minimum IAM permissions needed for its specific tools. Use Workload Identity with role-specific service accounts never a single all-powerful SA for all agents.

Human-in-the-Loop Gates

For irreversible actions (sending emails, deploying code, database writes), require a human approval step before execution. Implement approval workflows via Pub/Sub pause + Cloud Tasks callback.

Network Policies

Use GKE Network Policies to restrict which agent pods can talk to which services. A researcher agent has no reason to reach the database writer service directly - enforce this in the cluster, not just in code.

What's Next: The Agentic Platform

The direction of travel is clear. GKE is evolving from an application runtime into an agentic platform - a place where autonomous AI systems can be deployed, composed, monitored, and governed with the same rigor we apply to microservices today.
Several emerging capabilities are worth tracking:

Agent-to-Agent Communication (A2A Protocol) - Google's emerging standard for cross-agent RPC, allowing agents built with different frameworks to interoperate. GKE provides the network fabric for this via internal load balancers and service mesh.

Model Context Protocol (MCP) on Kubernetes - MCP is becoming the standard way for agents to discover and call tools. Running MCP servers as sidecar containers or standalone Deployments in GKE makes tool registries cluster-native.

Vertex AI Agent Engine - Google's fully managed orchestration layer for agents that sits above GKE, handling session management, tool routing, and evaluation out of the box. The boundary between GKE and managed agent infrastructure will continue to blur.

"Kubernetes wasn't built for AI. But it turns out the problems of distributed systems - scale, failure, state, observability - are exactly the problems agentic AI inherits."

Core Reference Documentation

https://docs.cloud.google.com/kubernetes-engine/docs/integrations/ai-infra

https://github.com/GoogleCloudPlatform/accelerated-platforms/blob/main/docs/platforms/gke/base/use-cases/inference-ref-arch/README.md

https://docs.cloud.google.com/agent-builder/agent-development-kit/overview

Hands-on Tutorials

https://codelabs.developers.google.com/devsite/codelabs/build-agents-with-adk-foundation

https://cloud.google.com/blog/topics/developers-practitioners/build-a-multi-agent-system-for-expert-content-with-google-adk-mcp-and-cloud-run-part-1

Hooking up CrewAI with Google Gemini for Multi-Agent Automation Systems

Saurabh Mishra — Mon, 16 Feb 2026 15:55:24 +0000

Google’s AI ecosystem is vast and powerful, featuring Google Gemini models (accessible via API) and Google AI Studio (a brilliant web IDE for experimenting with and deploying generative AI apps). But what happens when you combine that raw reasoning capability with an autonomous orchestration framework?

CrewAI

CrewAI is an open-source Python framework that lets you build and orchestrate multiple AI agents that collaborate to accomplish complex tasks like a virtual team of specialists. It organizes agents, assigns them roles and lets them delegate and share tasks.

Why Gemini + CrewAI?

CrewAI allows you to define agents with highly specific roles, goals and backstories. Under the hood, it uses LiteLLM (or LangChain wrappers) to route calls to the language model of your choice.

By hooking CrewAI into Google’s Gemini models (like gemini-2.5-flash or other models), we get:

Lightning-fast reasoning required for agentic loops.
Massive context windows for analyzing huge codebases, logs, or documentation.
Natively integrated Google Search grounding, perfect for agents that need to research complex code, real-time data, or modern architecture patterns.

Step 1: **Setup and Authentication**
To get started, we need to configure CrewAI to use Gemini models.

Get Gemini API Key:

Go to Google AI Studio or the Google Cloud console.
Create an API key for Gemini.
Save this API key , we’ll need it to authenticate your LLM in CrewAI.
Install Dependencies: Install the required packages

pip install crewai
python3.11 -m pip install langchain-google-genai

NOTE: langchain-google-genai requires Python 3.9+

Step 2: **The Scenario & Initializing the Brain**
Let’s build a highly relevant, real-world scenario: An Automated Cloud Infrastructure Design Team. We will create a two-agent crew:

A Principal Cloud Architect to design the system.
A Lead DevSecOps Engineer to tear it apart and review it for vulnerabilities.
First, let’s set up our script and initialize the Gemini “brain” using LangChain’s wrapper.

import os
from crewai import Agent, Task, Crew, Process
from langchain_google_genai import ChatGoogleGenerativeAI

# ==========================================
# 1. Configuration & Setup
# ==========================================
# Replace 'YOUR_API_KEY' with your actual Gemini API key, 
# or set it in your environment variables before running the script.
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY", "YOUR_API_KEY")

# Initialize the Gemini model
# Using gemini-2.5-flash for complex reasoning and architecture design
gemini_llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.4 # Slightly creative, but grounded in technical reality
)

Step 3: **Defining the Agents**
Agents need a clear identity to function properly. In CrewAI, we define their role, goal, and backstory to give the LLM strict boundaries and deep, specialized context.

# ==========================================
# 2. Define the Agents
# ==========================================
cloud_architect = Agent(
    role='Principal Cloud Architect',
    goal='Design highly scalable, resilient, and cost-effective cloud infrastructures based on user requirements.',
    backstory=(
        "You are a seasoned cloud architect with 15+ years of experience across AWS, GCP, and Azure. "
        "You excel at designing modern microservices, serverless architectures, and event-driven systems. "
        "Your primary focus is ensuring the system can handle massive scale while keeping latency low."
    ),
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm
)

devsecops_engineer = Agent(
    role='Lead DevSecOps Engineer',
    goal='Rigorously review cloud architectures to identify vulnerabilities, ensure compliance, and enforce zero-trust security.',
    backstory=(
        "You are a paranoid but brilliant cybersecurity veteran. You specialize in cloud security posture management, "
        "IAM least-privilege policies, network isolation, and data encryption. You view every architecture through "
        "the lens of a potential attacker and fix flaws before deployment."
    ),
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm
)

Step 4: **Defining the Tasks**
Agents are useless without clear instructions. Tasks in CrewAI define what needs to be done, the expected output, and who is responsible for executing it.

# ==========================================
# 3. Define the Tasks
# ==========================================
project_scenario = (
    "A global e-commerce platform transitioning from a monolith to microservices. "
    "It requires secure user authentication, a high-throughput inventory management system, "
    "and seamless integration with third-party payment gateways. It anticipates massive traffic spikes during holiday sales."
)

design_task = Task(
    description=(
        f"Analyze the following project scenario: '{project_scenario}'.\n"
        "Create a comprehensive cloud architecture design. You must specify the cloud provider (or multi-cloud), "
        "compute resources, databases, caching layers, message queues, and content delivery networks. "
        "Justify why you chose these specific services."
    ),
    expected_output="A detailed Architectural Design Document outlining services, data flow, and scaling strategies.",
    agent=cloud_architect
)

security_review_task = Task(
    description=(
        "Critically review the Architectural Design Document produced by the Principal Cloud Architect. "
        "Identify at least 3 potential security vulnerabilities or single points of failure. "
        "Provide concrete, actionable remediations for each vulnerability (e.g., adding WAF, adjusting VPC peering, enforcing KMS encryption)."
    ),
    expected_output="A Security Audit Report listing vulnerabilities found, risk severity, and mandatory architecture modifications.",
    agent=devsecops_engineer
)

Step 5: **Form the Crew and Execute!
**

# ==========================================
# 4. Form the Crew and Execute
# ==========================================
cloud_engineering_crew = Crew(
    agents=[cloud_architect, devsecops_engineer],
    tasks=[design_task, security_review_task],
    process=Process.sequential, # The DevSecOps engineer waits for the Architect
    verbose=True
)

if __name__ == "__main__":
    print("Booting up the Automated Cloud Infrastructure Design Team...")
    print("Initiating CrewAI sequence. Please wait while the agents collaborate...\n")

    # Kickoff the process
    result = cloud_engineering_crew.kickoff()

    print("\n" + "="*50)
    print("FINAL DEVSECOPS REVIEW & SECURED ARCHITECTURE")
    print("="*50 + "\n")
    print(result)

Complete code:-

import os
from crewai import Agent, Task, Crew, Process
from langchain_google_genai import ChatGoogleGenerativeAI

# ==========================================
# 1. Configuration & Setup
# ==========================================
# Replace 'YOUR_API_KEY' with your actual Gemini API key, 
# or set it in your environment variables before running the script.
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY", "YOUR_API_KEY")

# Initialize the Gemini model
# Using gemini-2.5-flash for complex reasoning and architecture design
gemini_llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",

    temperature=0.4 # Slightly creative, but grounded in technical reality
)

# ==========================================
# 2. Define the Agents
# ==========================================
cloud_architect = Agent(
    role='Principal Cloud Architect',
    goal='Design highly scalable, resilient, and cost-effective cloud infrastructures based on user requirements.',
    backstory=(
        "You are a seasoned cloud architect with 15+ years of experience across AWS, GCP, and Azure. "
        "You excel at designing modern microservices, serverless architectures, and event-driven systems. "
        "Your primary focus is ensuring the system can handle massive scale while keeping latency low."
    ),
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm
)

devsecops_engineer = Agent(
    role='Lead DevSecOps Engineer',
    goal='Rigorously review cloud architectures to identify vulnerabilities, ensure compliance, and enforce zero-trust security.',
    backstory=(
        "You are a paranoid but brilliant cybersecurity veteran. You specialize in cloud security posture management, "
        "IAM least-privilege policies, network isolation, and data encryption. You view every architecture through "
        "the lens of a potential attacker and fix flaws before deployment."
    ),
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm
)

# ==========================================
# 3. Define the Tasks
# ==========================================
# The scenario we want them to work on
project_scenario = (
    "A global e-commerce platform transitioning from a monolith to microservices. "
    "It requires secure user authentication, a high-throughput inventory management system, "
    "and seamless integration with third-party payment gateways. It anticipates massive traffic spikes during holiday sales."
)

design_task = Task(
    description=(
        f"Analyze the following project scenario: '{project_scenario}'.\n"
        "Create a comprehensive cloud architecture design. You must specify the cloud provider (or multi-cloud), "
        "compute resources, databases, caching layers, message queues, and content delivery networks. "
        "Justify why you chose these specific services."
    ),
    expected_output="A detailed Architectural Design Document outlining services, data flow, and scaling strategies.",
    agent=cloud_architect
)

security_review_task = Task(
    description=(
        "Critically review the Architectural Design Document produced by the Principal Cloud Architect. "
        "Identify at least 3 potential security vulnerabilities or single points of failure. "
        "Provide concrete, actionable remediations for each vulnerability (e.g., adding WAF, adjusting VPC peering, enforcing KMS encryption)."
    ),
    expected_output="A Security Audit Report listing vulnerabilities found, risk severity, and mandatory architecture modifications.",
    agent=devsecops_engineer
)

# ==========================================
# 4. Form the Crew and Execute
# ==========================================
cloud_engineering_crew = Crew(
    agents=[cloud_architect, devsecops_engineer],
    tasks=[design_task, security_review_task],
    process=Process.sequential, # The DevSecOps engineer waits for the Architect to finish
    verbose=True
)

if __name__ == "__main__":
    print("🚀 Booting up the Automated Cloud Infrastructure Design Team...")
    print("Initiating CrewAI sequence. Please wait while the agents collaborate...\n")

    # Kickoff the process
    result = cloud_engineering_crew.kickoff()

    print("\n" + "="*50)
    print("FINAL DEVSECOPS REVIEW & SECURED ARCHITECTURE")
    print("="*50 + "\n")
    print(result)

Results:-
Run this script in terminal and watch Gemini stream its thought process

Integrate Other Google Tools (Optional)
Want to take this to the enterprise level? CrewAI supports robust integrations with Google’s Workspace apps via its enterprise platform/tools ecosystem

Google Drive

You can allow agents to upload/download files to Drive — useful for storing outputs.

Google Docs
Create, read, and edit Google Docs documents.

Google Sheets
Create, read, and update Google Sheets spreadsheets and manage worksheet data.

To enable these, you connect your Google account via OAuth in CrewAI’s integrations dashboard then grant permissions.

**
Documentation References**

https://docs.crewai.com/en/introduction

https://ai.google.dev/gemini-api/docs/crewai-example

https://developers.googleblog.com/building-agents-google-gemini-open-source-frameworks/