Forem: Karthigayan Devan

Building Aura: A Multimodal Smart Home Operated by Gemini Live 🌌

Karthigayan Devan — Mon, 16 Mar 2026 01:42:26 +0000

💡 The Problem with Smart Homes

Smart homes today are often fragmented and reactive. You speak into a puck on the wall, and it toggles a light on a screen. There is no continuous awareness.

For the Gemini Live Agent Challenge 2026, I wanted to build something that feels alive. Inspired by futuristic sci-fi interfaces, I built Aura — a central AI operating pilot that doesn't just hear you, but sees your environment concurrently and translates that intelligence into a living, responsive Ambient Dashboard layout natively.

🚀 What is Aura?

Aura is a fully multimodal smart home operating system utilizing bidirectional WebSockets over continuous, low-latency backpressure limits.

Unlike previous generations of voice assistants that rely on turn-taking (Speech-to-Text ➔ LLM ➔ Text-to-Speech), Aura streams continuous raw audio and webcam frames concurrently using the google/genai Node SDK.

🛠️ The Architecture

I engineered a decoupled reactive container pipeline deployed on Google Cloud Run:

⚡ Secret Sauce: Native Visual Concurrency

The biggest challenge I ran into was translating standard 16:9 webcam buffers onto square visual grids without distorting the frame aspect ratio. AI can hallucinate if you squash the context!

I fixed this by injecting a continuous Canvas Context buffer scaling calculation on every local-exec push:

// Quick glimpse at the frontend scaling preserving 1:1 ratios
const scale = Math.min(600 / video.videoWidth, 600 / video.videoHeight);
const x = (600 - video.videoWidth * scale) / 2;
const y = (600 - video.videoHeight * scale) / 2;
ctx.drawImage(video, x, y, video.videoWidth * scale, video.videoHeight * scale);

🚨 Visual Ambient States (The "Wow" Factor)

Dashboard views shouldn't just list data. When Aura triggers a smart decision, the full Chrome viewport adapts natively using CSS Variable Overrides:

💡 .lights-off (Ambient Dimming): Absolute viewport drop-shadow shading to deep #06080E with neon frame glowing edges securely.
🚨 .emergency-global (Strobe Alerting): Repeating red and white absolute background flashes demanded continuous viewer security attention.
🌡️ Thermal Card Shadings: Thermostats pulse with continuous Amber shadings overlays strictly enforcing accurate contextual reading gradients safely.

🎥 Check out the Demo Video!
https://www.youtube.com/watch?v=Vm2iGpAuexQ

📂 Source Code

The code is 100% open-weight and available on GitHub: 👉 https://github.com/karthidec/gemini-agent-challenge.git

⚠️ Contest Disclaimer

This project is an entry for the Google Gemini Live Agent Challenge 2026. Explicitly leveraging @google/genai continuous WebSocket routing modules.

What do you think of this continuous audio/vision ambient approach for smart environments? Let me know in the comments below! 🌌✨

Beyond Dashboards: Architecting a GenAI FinOps Analyst using BigQuery Native MCP

Karthigayan Devan — Sat, 21 Feb 2026 15:55:36 +0000

Google Cloud bills are larger and more complex to understand. You get a PDF summary that says "Compute Engine: $5,000", but when you ask why, you're tasked with downloading a massive CSV or wrestling with the Google Cloud Billing Console's filters (at SKU's level).

For true FinOps visibility, most engineering teams turn to the Cloud Billing Export. This feature dumps every line item of your usage down to the SKU and timestamp into a BigQuery dataset. It is the single source of truth.

But here is the catch: querying that data requires complex SQL. You need to know that cost is in one column, credits are nested in a JSON array, and project.labels requires unnesting.

We need a better way. A user can just ask: "Why is the dev environment costing 20% more this week?"

To build this, we do not write 1,000 lines of SQL-generation code. Instead, we can use Google's Native BigQuery MCP Server with Gemini. Here is how I built a "FinOps for Everyone" agent (architecture design) that lets you chat directly with your raw billing data.

Native MCP:

The Model Context Protocol (MCP) is becoming the standard for connecting LLMs to data. Usually, you have to build an "MCP Server", a small app that sits between the LLM and your database.

But Google has done something different. They published a Native BigQuery MCP Server.

You don't deploy this server. You don't manage it. It is a public endpoint (https://bigquery.googleapis.com/mcp) that your agent connects to. It exposes BigQuery's capabilities (schema inspection, querying, job management) directly to the LLM as tools.

This changes everything. It means our FinOps agent is just a lightweight Python script. The heavy lifting of understanding the database structure is handled by the native protocol.

The Architecture:

Step 1: The Data Foundation (Billing Export)

Before writing code, you need the data.

Go to the Google Cloud Console > Billing > Billing Export.
Enable BigQuery Export.
This creates a dataset (e.g., billing_export_v1_XXXX) that fills with raw usage data every few hours.

This is your gold mine. It contains every CPU cycle and storage byte you are paying for.

Step 2: The Agent Code

I used the Google Gen AI Agent Development Kit (google-adk). The critical piece is connecting to the native MCP URL.

Connecting to the Native Server

We don't need to define tools like execute_sql. We just tell the ADK to talk to the BigQuery MCP URL.

The "Anti-Hallucination" Prompt

The Billing Export schema is huge. If you ask Gemini to "show me costs," it might guess a column name like total_cost when the actual column is cost or usage.amount_in_pricing_units.

To fix this, I set strict instructions in the agent's system prompt:

System Instruction:
You are a FinOps Analyst. You have access to the BigQuery MCP tools.
Rule 1: Never guess column names.
Rule 2: Before answering a question, use list_tables to find the billing table, then use get_table_schema to see the actual columns.
Rule 3: Only then, write and execute the SQL query.

This "Look before you Leap" pattern makes the agent incredibly robust. If Google updates the export schema tomorrow, my agent adapts instantly because it reads the schema at runtime.

Step 3: Deployment (Local & Prod)

Running Locally (The "Analyst" Mode)

For ad-hoc analysis, I run this script on my laptop.

gcloud auth application-default login (This gives the script my user permissions).
python agent.py
I chat with it: "Break down the cost of our AI/ML projects for the last 10 days by SKU."

Deploying to Prod (The "Team" Mode)

To let the whole team use it:

Wrap the script in a Docker container.
Deploy to Cloud Run.
Use the cloud run endpoint as an endpoint MCP server (also, we can leverage Agent Engine pattern for this deployment)

Now, anyone in the internal slack/chat portal can ask cost questions without needing BigQuery IAM access—the agent acts as the secure gateway.

Why This is the Future of FinOps

We are moving past static dashboards. Dashboards answer questions you asked yesterday. Agents answer the questions you have today.

By using the Native BigQuery MCP Server, we get:

Security: No database credentials stored in the app. It uses standard OAuth/IAM.
Maintainability: Zero SQL parsing code. The MCP protocol handles the tool definitions.
Depth: You aren't limited to pre-aggregated views. You are querying the raw export. If you want to know how much you spent on "Network Egress to Australia" at 2 AM on a Sunday, the data is there, and the agent can write the SQL to find it.

This is FinOps for everyone—democratizing cost data so engineers can own their cloud spend.

Tech Stack:

Google Cloud Platform (GCP)
BigQuery (BQ): Data warehouse for billing exports.
Google Cloud Billing Export: Source of raw financial data.
Model Context Protocol (MCP): Standard for LLM-tool interaction.
Native BigQuery MCP Server: Google-managed endpoint exposing BigQuery capabilities.
Gemini (e.g., Gemini 2.5 Flash): The Large Language Model powering the agent.
Google ADK (Agent Development Kit): Python library for building agents and MCP client interactions.

References:

https://docs.cloud.google.com/bigquery/docs/use-bigquery-mcp

Building a “Local-First” AI FinOps Agent with Gemini CLI & MCP: Ending the Google Cloud Cost Puzzle

Karthigayan Devan — Sun, 25 Jan 2026 19:23:25 +0000

If you’ve ever tried to get a quick answer on “Why did our cloud spend spike yesterday?” and found yourself tangled in slow dashboards, expensive queries, or pricey SaaS licenses, welcome to the club. FinOps is hard, but ironically, analyzing cloud costs often feels more expensive and cumbersome than the costs themselves.

In this article, I want to share a fresh architectural approach that flips the script entirely, a “Local-First” AI FinOps Agent that lives right on your laptop, powered by Google’s Gemini CLI and the Model Context Protocol (MCP). The result? Instant, natural-language answers about your cloud billing data, zero cloud query charges, and absolutely no dashboard lag. Here’s how.

The Problem: Paying to Understand What We Pay For

When monitoring Google Cloud costs, we face a strange paradox, a classic “Cost of Cost Analysis”. Let’s break down the pain points I see every day:

1. The BigQuery Scan Tax 💸

Your billing data lives in BigQuery, which charges based on data scanned, about $6.25 per terabyte.
That means one careless query by someone who just wants to “see all logs from last week” can cost you tens of dollars, and that’s before the productivity cost of waiting for results. Ouch.

2. The Licensing Barrier 🚪

Natural language querying tools like Gemini for Google Cloud can automatically turn a casual question into an SQL query. Sounds perfect, right? Except they come at a per-seat price (~$19/user/month), which quickly balloons the cost when you try to roll them out organization-wide (a few are free to start with and later changed to a charging model)

3. Dashboard Latency and Rigidity 🕰️

BI dashboards hide complexity behind clicks and charts, but often feel clunky for deep-dive or ad-hoc questions. They force you to navigate predefined views, not exactly conversational or fast when you’re racing a fire drill.

Our Solution: The Local-First Architecture 🏠✨

What if the “heavy lifting” didn’t happen in the cloud? What if every engineer had instant access to their project’s billing data, answering natural questions locally without costing a dime more?

Here’s the core idea:

Shift the query compute from the cloud to your laptop.

We leverage three ingredients to pull this off:

Gemini CLI: Our natural language interface that transforms plain English into SQL queries.
Model Context Protocol (MCP): A lightweight local server running on the user's machine that orchestrates and executes SQL queries against local databases.
Decentralized data sync: Syncing compact, optimized billing datasets to engineers’ devices using existing corporate storage tools (OneDrive / SharePoint).

Core Design Principles:

Query Once, Distribute Many: Execute one optimized aggregation query per day in the cloud.
Zero-Cost Local Queries: All ad-hoc analysis happens on the user's laptop using local storage.
Natural Language Interface: Use Generative AI (Gemini) to translate user intent into database queries locally.

How It Works: A Day in the Life of Your Local-First FinOps Agent

Step 1: Ingest & Optimize

Instead of running thousands of raw billing queries, a single optimized aggregation runs once a day in the cloud. This job:

Summarizes raw billing logs into partitioned, compressed SQLite databases for efficient local querying.
Data is partitioned by dimensions like Project, Service, and Date for quick filtering.
Scans only a fraction of the data compared to raw logs. This yields 1 query/day instead of 1,000 queries/day, dramatically cutting costs.

Step 2: Sync

The aggregated SQLite database files are synced to every engineer’s laptop through OneDrive or SharePoint sync clients — no new infrastructure, no added cloud storage cost.

Sync happens incrementally.
Files remain small (a few hundred MB, optimized by partitioning and compression).
Data privacy is controlled by existing SharePoint permissions.

Step 3: Query Locally with MCP & Gemini

Here’s where the magic happens:

A local MCP agent runs as a lightweight server on your machine.
Gemini CLI takes your natural language query and passes an SQL prompt to MCP.
MCP uses a SQLite engine locally to run queries within milliseconds.
Results are returned to Gemini to synthesize human-readable answers by leveraging large language model reasoning on local computation context.

Example User Interaction

Open your gemini cli terminal and enter:

why is the checkout service 20% over budget this month?

Under the hood:

Gemini translates this to something like:

SELECT service, project, SUM(cost) as total_cost
FROM billing_summary
WHERE service = 'Checkout Service' AND usage_date BETWEEN '2024-12-01' AND '2024-12-31'
GROUP BY service, project;

The MCP agent runs this query locally on the SQLite file synced to the laptop.
Raw costs for this service are fetched instantly.
Gemini’s natural language model synthesizes the insight: “The increase is driven by a new Spanner instance checkout-db-prod provisioned on the 15th."

No cloud queries. No expensive SaaS fees. Instant answers.

Security & Governance: Keeping Data Safe & Relevant

Data Residency: All billing data resides only on the local machines of authorized users. No outbound data is sent to 3rd-party AI API endpoints, preserving confidentiality.
Role-Based Access: The local MCP agent can implement filters based on user role or project membership, ensuring users only query relevant data.
Auditability: Query logs remain local, avoiding centralized data exposure while enabling traceability on the user’s machine.

Comparative Analysis

| Feature        | Direct BigQuery           | BI Dashboards              | Proposed Local Agent                |
|----------------|---------------------------|----------------------------|-----------------------------------|
| Cost Per Query | High ($5+ / TB)           | Med (Hidden Refresh Costs) | Zero ($0.00)                      |
| Speed          | Variable (Queue times)    | Slow (Load times)          | Instant                          |
| Flexibility    | High (Full SQL)           | Low (Fixed Views)          | High (Natural Language)           |
| Accessibility  | Low (Requires SQL skills) | Med (Requires Access)      | High (Chat Interface)             |
| Data Freshness | Real-time                 | Delayed                   | Daily Sync (Sufficient for FinOps) |

Final Thoughts

Combining Gemini CLI, MCP, and a smart decentralized sync strategy unlocks a new kind of FinOps, one where cost visibility is effortless, inexpensive, and immediate.

The cloud should never charge you for asking about your bills. By shifting the compute closer to users and blending in natural language AI, we finally solve the paradox of cloud cost analysis.

Building an AI-Native Data Interface with Google ADK, MCP, and BigQuery

Karthigayan Devan — Sat, 17 Jan 2026 12:39:09 +0000

Introduction

For many years, enterprise data interaction has followed a predictable pattern. Engineers write SQL, teams build dashboards, and organizations rely on BI tools to understand system behavior, business performance, and costs. While these approaches remain useful, they are increasingly insufficient in modern cloud environments where scale, velocity, and operational complexity demand faster, more intelligent decision-making.

From an SRE, platform engineering, and FinOps perspective, the challenge is no longer just accessing data. The challenge is enabling safe, governed, and intelligent interaction with data that supports reliability, cost optimization, and continuous cloud transformation.

To address this, I tried a fully working proof of concept (PoC) using Google ADK, Model Context Protocol (MCP), and BigQuery, strictly based on Google’s official documentation and extended with production-grade engineering considerations. This was not a conceptual exercise or a demo-only prototype. The system runs end-to-end and reflects architectural patterns suitable for real enterprise platforms.

In this article, I describe what I built, why this architecture matters for modern cloud organizations, and how Google ADK and MCP fundamentally change how AI systems can support SRE, platform, and FinOps workflows at scale.

What This PoC Is Really About

At its core, this PoC explores a simple but powerful idea:
What if AI agents interacted with enterprise data the same way production-grade cloud systems are expected to?

Rather than embedding SQL logic into prompts or granting broad database access, this PoC demonstrates how an AI agent can:

Reason about operational, financial, or analytical questions
Discover approved tools dynamically
Access BigQuery through a governed, auditable interface
Receive structured results suitable for reliable downstream reasoning

The agent never improvises data access. Every interaction is explicit, policy-aligned, and traceable, which is essential in SRE- and FinOps-driven environments.

Why This Problem Is Worth Solving

Many AI + data examples appear impressive but fail under real operational constraints. Common issues include hardcoded SQL in prompts, excessive permissions, and no separation between reasoning logic and execution logic.

From an SRE and platform engineering standpoint, these patterns introduce unacceptable risk. From a FinOps standpoint, they obscure cost attribution, accountability, and governance.

This PoC takes a different approach. Data access is treated as a platform capability, not a prompt-level shortcut. This distinction is critical for organizations focused on reliability, security, cost efficiency, and sustainable cloud transformation.

A Quick Introduction to Google ADK

Google ADK (Agent Development Kit) provides a structured framework for building agentic systems that align well with cloud-native engineering principles. Instead of focusing solely on prompts, ADK formalizes agents, tools, reasoning loops, and context boundaries.

For senior engineers and platform architects, ADK feels intuitive. It mirrors how reliable systems are designed: with explicit contracts, modular components, and controlled execution paths. You are not instructing a model to respond once; you are defining how it reasons, when it acts, and what platform capabilities it may invoke.

This makes ADK particularly relevant for production systems supporting SRE automation, FinOps analysis, and large-scale cloud operations.

Understanding MCP (Model Context Protocol)

MCP is a critical architectural component of this solution.

Rather than allowing AI models to directly manipulate external systems, MCP introduces a formal protocol for tool-based interaction. Tools expose schemas, models discover capabilities, and all inputs and outputs are structured and validated.

In practice, the model does not need to understand BigQuery's internals. It only needs to understand the operational contract defined by the MCP tool.

This design closely aligns with platform engineering best practices and enables AI systems to operate within the same governance boundaries as other production services.

MCP and BigQuery: Why This Combination Matters

Google’s official MCP support for BigQuery is especially impactful because BigQuery is often the system of record for analytics, operational metrics, and cost data.

By exposing BigQuery through MCP:

Access can be tightly scoped and governed
Queries are executed only through approved interfaces
Permissions remain centralized and auditable

The AI agent never becomes a privileged database user. Instead, it behaves as a controlled platform consumer, consistent with how SRE and FinOps systems are expected to operate.

High-Level Architecture

At a high level, the PoC architecture looks like this:

AI Agent (Google ADK)
        |
        |  Structured MCP tool calls
        v
MCP Server (BigQuery)
        |
        v
BigQuery datasets

The most important takeaway here is the separation of concerns.

Reasoning happens in the agent.
Execution happens in BigQuery.
MCP sits cleanly in between.

How you can try this PoC in your local env

Local Config Setup

#clone my github repo:
git clone https://github.com/karthidec/google-adk-mcp-bigquery.git

#authenticate with google cloud
gcloud config set project [YOUR-PROJECT-ID]
gcloud auth application-default login

#enable bigquery mcp server in your google project
gcloud beta services mcp enable bigquery.googleapis.com --project=PROJECT_ID

# Create virtual environment
python3 -m venv .venv

# Activate virtual environment
source .venv/bin/activate

# Install ADK
pip install google-adk

# Navigate to the app directory
cd bq_mcp/

# Run the ADK web interface
adk web

Defining the Agent

The agent is created using Google ADK primitives. It includes a reasoning loop and the ability to discover and invoke MCP tools.

import os
import dotenv
from pathlib import Path
from . import tools
from google.adk.agents import Agent

# Load environment variables relative to this file
dotenv.load_dotenv(Path(__file__).parent / ".env")

PROJECT_ID = os.getenv('GOOGLE_CLOUD_PROJECT', 'project_not_set')

# Initialize the toolset
bigquery_toolset = tools.get_bigquery_mcp_toolset()

# Define the Agent
root_agent = Agent(
    model='gemini-2.5-flash', # Leveraging a fast, reasoning-capable model
    name='root_agent',
    instruction=f"""
                Help the user answer questions by strategically combining insights from two sources:

                1.  **BigQuery toolset:** Access demographic (inc. foot traffic index), 
                    product pricing, and historical sales data in the mcp_bakery dataset. 
                    Do not use any other dataset.

                Run all query jobs from project id: {PROJECT_ID}. 
                Give list of zipcodes or any general query user wants to know. 
            """,
    tools=[bigquery_toolset]
)

This structure makes the agent predictable and easier to extend over time.

Registering BigQuery as an MCP Tool

Next, BigQuery is exposed through an MCP server. The server defines exactly what operations are available and how requests should be shaped.

import os
import dotenv
import google.auth
import google.auth.transport.requests
from pathlib import Path
from typing import Dict, Optional, Any

# We use Any here to avoid strict dependency issues if ADK isn't fully typed in your env
try:
    from google.adk.agents.readonly_context import ReadonlyContext
except ImportError:
    ReadonlyContext = Any

from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPConnectionParams

# Robustly load .env from the script's directory
dotenv.load_dotenv(Path(__file__).parent / ".env")

BIGQUERY_MCP_URL = "https://bigquery.googleapis.com/mcp"

def get_auth_headers() -> Dict[str, str]:
    """
    Generates fresh authentication headers. 
    Crucial for long-running agents where tokens might expire.
    """
    credentials, project_id = google.auth.default(
        scopes=["https://www.googleapis.com/auth/bigquery"]
    )

    # Fallback if default auth doesn't pick up the project ID
    if not project_id:
        project_id = os.getenv("GOOGLE_CLOUD_PROJECT")

    # Force a token refresh to ensure validity
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    oauth_token = credentials.token

    return {
        "Authorization": f"Bearer {oauth_token}",
        "x-goog-user-project": project_id if project_id else "",
        "Content-Type": "application/json"
    }

def auth_header_provider(context: Optional[ReadonlyContext]) -> Dict[str, str]:
    """Callback function used by the MCP Client to inject headers."""
    return get_auth_headers()

def get_bigquery_mcp_toolset():
    # Initial headers for the handshake
    initial_headers = get_auth_headers()

    # We use StreamableHTTPConnectionParams for efficient data transfer
    tools = MCPToolset(
        connection_params=StreamableHTTPConnectionParams(
            url=BIGQUERY_MCP_URL,
            headers=initial_headers
        ),
        # This provider is the secret sauce for handling token expiry
        header_provider=auth_header_provider
    )
    return tools

This step is where governance really comes into play. The tool definition becomes the contract.

The response is structured, predictable, and easy for the agent to interpret.

Why This PoC Goes Beyond a Demo

The value of this PoC lies not only in functionality, but in architectural discipline.

It demonstrates:

Agentic reasoning aligned with SRE decision workflows
Tool-based execution instead of brittle SQL-in-prompt patterns
Enterprise-grade governance and security
A realistic path from PoC to production

For SRE, platform engineering, and FinOps leaders, these characteristics are essential.

Practical Use Cases

This architectural pattern supports multiple high-impact use cases:

FinOps assistants for cost visibility and optimization
SRE copilots for reliability, incident analysis, and capacity planning
Platform analytics agents with strict access controls
Executive decision systems grounded in governed cloud data

The same design scales naturally as organizational maturity grows.

Key Takeaways

Building this PoC reinforced several core principles:

MCP is foundational for governed, enterprise AI
Google ADK aligns well with platform engineering practices
BigQuery is a natural backend for SRE and FinOps intelligence
Separation of reasoning and execution is non-negotiable

Conclusion

This PoC demonstrates that AI-native cloud platforms have moved beyond experimentation. With Google ADK and MCP, it is now possible to build intelligent agents that support reliability engineering, platform operations, and financial governance in a secure and scalable way.

For organizations undergoing cloud digital transformation, this approach provides a disciplined foundation for integrating AI into core operational workflows rather than treating it as an isolated experiment.

Happy building 🚀

PoC Shot

References

Google Agent Development Kit (ADK): A robust framework for building, testing, and deploying AI agents.
Google Big Query Model Context Protocol (MCP): An open standard that acts like a "USB port" for AI. We specifically use the BigQuery MCP Server to connect our model to data without custom, brittle integrations.

Key Challenge Faced: Authentication & Session Management

During this PoC, the biggest technical hurdle was authentication. Standard HTTP connections often use static headers. However, Google Cloud OAuth tokens are short-lived (usually 1 hour). If your agent runs longer than that, a static token results in a 403 Forbidden error.

To solve this, I implemented a Dynamic Header Provider. Instead of passing a fixed string, I pass a function that regenerates the OAuth token whenever the MCP session establishes a connection or refreshes its token.

Google Cloud Model Armor - LLMs Protection

Karthigayan Devan — Sun, 21 Sep 2025 14:14:32 +0000

Cloud Armor:

Google Cloud Armor helps protect your infrastructure and applications from Layer 3/Layer 4 network or protocol-based volumetric distributed denial-of-service (DDoS) attacks, volumetric Layer 7 attacks, and other targeted application attacks. It leverages Google's global network and distributed infrastructure to detect and absorb attacks and filter traffic through user-configurable security policies at the edge of Google's network, far upstream of your workloads.

Model Armor takes care of a few significant threats as covered in the OWASP top 10 LLM vulnerabilities list.

Malicious files and unsafe URLs
Prompt injection and jailbreaks
Sensitive data
Offensive material

Core Features:

Floor settings establish the bare minimum security requirements that all your custom configurations within the template must meet. It's the security bedrock.

Organization level:

A floor setting at this level adds minimum requirements to all templates associated with any project and any folder inside the organization

Folder level:

A floor setting at this level adds a minimum requirement to all templates associated with any project inside the folder.

Project level:

A floor setting at this level adds a minimum requirement to all templates associated with a project.

Template:

A template is your control panel, letting you dial in exactly how Model Armor examines prompts and responses.

Confidence level:

Low and above:

Model Armor screens almost everything. At this level, it's going to identify issues with the smallest hint of alignment to the detection criteria.

Medium and above:

Model Armor is a bit more discerning. It flags things that are moderate matches to the detection criteria.

High and above:

Model Armor is pretty darn confident that the information is a strong match to the detection criteria.

How to enable Model Armor in Google Cloud?

Navigate to Security Command Center -> Model Armor -> Enable API

Configure floor settings:

Detections:

Responsible AI:

Saved Floor settings:

Configure template settings:

After you create the template, it will be saved as follows.

Logs:

Model Armor is a multi-tasker. It's screening the text going in and out of the LLM, and it's also taking notes on the activities. These notes are surfaced to you in the form of logs.

Admin Activity audit logs capture details about templates, floor settings, and basic computing (CRUD) operations.
Data access audit logs capture details about screening operations. For example, what template was used to screen a prompt or response, what was the text, and what was the result?

Logs Explorer:

Below are a few filters:

protoPayload.serviceName="modelarmor.googleapis.com"
- This filer shows you audit logs that track template actions like create or update.
protopayload.methodName="google.cloud.modelarmor.v1.ModelArmor.SanitizeUserPrompt"
- This filter shows you the Data Access audit logs that capture prompt and response screening.

Sample Python code:

# pip install google-cloud-modelarmor
from google.cloud import modelarmor_v1
import sys

# Create a client
client = modelarmor_v1.ModelArmorClient(transport="rest", client_options = {"api_endpoint" : "modelarmor.us-central1.rep.googleapis.com"})

# Initialize request argument(s)
user_prompt_data = modelarmor_v1.DataItem()

# Get the prompt from command line argument
if len(sys.argv) > 1: # Check if an argument is provided
    prompt = sys.argv[1] # Take the first argument as the prompt
else:
    # Fallback to a default prompt if no argument is provided
    prompt = "Placeholder prompt."

# Set prompt data for model armor call
user_prompt_data.text = prompt
ma_request = modelarmor_v1.SanitizeUserPromptRequest(
    name="projects/xxx-armor-demo-012346/locations/us-central1/templates/pijb-only", # name contains the project and template
    user_prompt_data=user_prompt_data,
)

# Make the MA request
ma_response = client.sanitize_user_prompt(request=ma_request)

# Take action based on Model Armor's result
if ma_response.sanitization_result.filter_results["pi_and_jailbreak"].pi_and_jailbreak_filter_result.match_state == modelarmor_v1.FilterMatchState.MATCH_FOUND: # A PIJB match was found
    print("Query failed security check. Error.")
else:
    print("Query passed security check. Sending prompt to LLM.")

Pricing model:

https://cloud.google.com/armor/pricing