Forem: joshyfruit

From IT Manager to AI Engineer: Build a Cloud Infrastructure Agent with Cloud Run's Managed MCP Server

joshyfruit — Thu, 30 Apr 2026 00:57:46 +0000

Google Cloud NEXT '26 dropped over 260 announcements. Most headlines went to Gemini 3.1, TPU v8, and the Agentic Data Cloud. But buried in the Cloud Run section was something that made me stop scrolling — a fully managed remote MCP server, now generally available. If you manage infrastructure AND build AI systems, this one's for you.

Why This Hit Different for Me

I wear two hats. One day I'm SSH-ing into VMs, reviewing Cloud Run deployments, and making sure services don't fall over at 2am. The next I'm wiring up LLM agents, building tool pipelines, and figuring out why my context window blew up. These two worlds have always felt weirdly disconnected.

MCP (Model Context Protocol) on Cloud Run is the first thing I've seen that genuinely bridges them. Instead of hand-crafting API clients for every infra operation your agent needs to do, you point it at a managed MCP server — and suddenly your AI agent can deploy services, read logs, and inspect health metrics like a junior SRE who never sleeps.

Let's build it.

What We're Building

By the end of this walkthrough you'll have:

The built-in Cloud Run MCP server wired up to Gemini CLI so you can manage deployments via natural language
A custom MCP server running on Cloud Run that exposes infrastructure health tools
An ADK agent that combines both to answer questions like "Which of my services had errors in the last hour?"

Here's the full picture of what we're assembling:

Prerequisites

A Google Cloud project with billing enabled
gcloud CLI installed and authenticated
Python 3.10+
Docker (for building the custom server)
Gemini CLI installed

Set your project up front so every command just works:

export PROJECT_ID="my-project-id"
export REGION="us-central1"
gcloud config set project $PROJECT_ID

IAM roles you'll need on your account:

roles/run.admin
roles/iam.serviceAccountUser
roles/artifactregistry.writer

Part 1 — Use the Built-in Cloud Run MCP Server

Google now hosts a fully managed MCP server at https://run.googleapis.com/mcp. It exposes tools like list_services, get_service, deploy_service_from_image, and deploy_service_from_archive — no setup required on your end.

Step 1: Authenticate

The managed endpoint uses your Google Cloud identity. Make sure your ADC (Application Default Credentials) are set:

gcloud auth application-default login

Step 2: Wire it to Gemini CLI

Open (or create) ~/.gemini/settings.json and add:

{
  "mcpServers": {
    "cloud-run": {
      "url": "https://run.googleapis.com/mcp",
      "transport": "http"
    }
  }
}

Step 3: Talk to Your Infrastructure

Fire up Gemini CLI and try this:

gemini

> List all my Cloud Run services in us-central1

You'll see it call list_services under the hood and return a clean summary of every service, its URL, and status. No gcloud run services list --region us-central1 --format=json | jq ... gymnastics required.

Try something bolder:

> Deploy the image us-docker.pkg.dev/cloudrun/container/hello to a new service
  called "hello-from-agent" in us-central1

It calls deploy_service_from_image, fills in the parameters, and your service is live. That's infrastructure-as-conversation, and honestly it feels a little magical the first time.

Here's what a full agent session looks like — listing services, spotting errors, and triggering a hotfix deploy all from one prompt chain:

IT Specialist note: The managed endpoint enforces Cloud IAM on every call. If your credentials don't have run.services.create, the deploy fails cleanly with a permission error — not a hallucinated success. That's the kind of guardrail you need when agents touch production infra.

Part 2 — Build & Deploy Your Own Custom MCP Server

The built-in server covers Cloud Run operations. But what about your custom health checks, log analysis, or cross-service diagnostics? That's where you roll your own.

We'll build an Infra Health MCP Server with three tools:

list_services — wraps Cloud Run's Admin API
get_service_error_rate — queries Cloud Logging for 5xx errors
check_service_health — returns a simple green/yellow/red status

Step 1: Create the Project

mkdir infra-health-mcp && cd infra-health-mcp

Create pyproject.toml:

[project]
name = "infra-health-mcp"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "fastmcp>=2.0.0",
    "google-cloud-run>=0.10.0",
    "google-cloud-logging>=3.0.0",
]

Step 2: Write the MCP Server

Create server.py:

import asyncio
import json
import logging
import os
from datetime import datetime, timedelta, timezone

from fastmcp import FastMCP
from google.cloud import run_v2, logging as cloud_logging

logger = logging.getLogger(__name__)
logging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)

mcp = FastMCP("Infra Health MCP Server")
run_client = run_v2.ServicesClient()
log_client = cloud_logging.Client()


@mcp.tool()
def list_services(project_id: str, region: str) -> str:
    """List all Cloud Run services with their status and URLs.

    Args:
        project_id: Google Cloud project ID
        region: GCP region (e.g. us-central1)

    Returns:
        JSON list of services with name, URL, and last deployment time
    """
    logger.info(f"Listing services in {project_id}/{region}")
    parent = f"projects/{project_id}/locations/{region}"
    services = []
    for svc in run_client.list_services(parent=parent):
        services.append({
            "name": svc.name.split("/")[-1],
            "uri": svc.uri,
            "last_deployed": svc.update_time.isoformat() if svc.update_time else "unknown",
            "ready": svc.terminal_condition.state.name if svc.terminal_condition else "unknown",
        })
    return json.dumps(services, indent=2)


@mcp.tool()
def get_service_error_rate(project_id: str, region: str, service_name: str, minutes: int = 60) -> str:
    """Get the 5xx error count for a Cloud Run service over a time window.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service
        minutes: How many minutes back to look (default 60)

    Returns:
        JSON with total requests, error count, and error rate percentage
    """
    logger.info(f"Checking error rate for {service_name} over last {minutes} minutes")
    since = datetime.now(timezone.utc) - timedelta(minutes=minutes)

    filter_str = (
        f'resource.type="cloud_run_revision" '
        f'resource.labels.service_name="{service_name}" '
        f'resource.labels.location="{region}" '
        f'httpRequest.status>=500 '
        f'timestamp>="{since.isoformat()}"'
    )

    error_count = sum(1 for _ in log_client.list_entries(
        filter_=filter_str,
        projects=[project_id],
    ))

    return json.dumps({
        "service": service_name,
        "window_minutes": minutes,
        "error_count_5xx": error_count,
        "checked_at": datetime.now(timezone.utc).isoformat(),
    })


@mcp.tool()
def check_service_health(project_id: str, region: str, service_name: str) -> str:
    """Return a simple health status for a Cloud Run service.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service

    Returns:
        JSON with status (green/yellow/red) and a human-readable reason
    """
    logger.info(f"Health check for {service_name}")
    name = f"projects/{project_id}/locations/{region}/services/{service_name}"
    svc = run_client.get_service(name=name)

    state = svc.terminal_condition.state.name if svc.terminal_condition else "UNKNOWN"

    if state == "CONDITION_SUCCEEDED":
        status, reason = "green", "Service is running and healthy"
    elif state in ("CONDITION_FAILED", "CONTAINER_FAILED"):
        status, reason = "red", f"Service is in a failed state: {state}"
    else:
        status, reason = "yellow", f"Service state is uncertain: {state}"

    return json.dumps({"service": service_name, "status": status, "reason": reason})


if __name__ == "__main__":
    port = int(os.getenv("PORT", 8080))
    logger.info(f"Infra Health MCP server starting on port {port}")
    asyncio.run(
        mcp.run_async(
            transport="streamable-http",
            host="0.0.0.0",
            port=port,
        )
    )

Why Streamable HTTP? Cloud Run is stateless and scales horizontally. The older SSE transport needed persistent connections — a terrible fit for serverless. Streamable HTTP uses plain POST/GET, so every request is independent. Your MCP server scales to zero between calls and you only pay when it's actually doing work.

Step 3: Containerize It

Create Dockerfile:

FROM python:3.13-slim

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

COPY . /app
WORKDIR /app

ENV PYTHONUNBUFFERED=1

RUN uv sync

EXPOSE $PORT

CMD ["uv", "run", "server.py"]

Step 4: Create a Service Account

Your MCP server needs permission to read Cloud Run and Cloud Logging:

gcloud iam service-accounts create infra-health-sa \
  --display-name="Infra Health MCP Server"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/run.viewer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/logging.viewer"

Step 5: Build & Deploy

# Create Artifact Registry repo
gcloud artifacts repositories create mcp-servers \
  --repository-format=docker \
  --location=$REGION

# Build and push
gcloud builds submit \
  --tag "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest"

# Deploy
gcloud run deploy infra-health-mcp \
  --image "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest" \
  --region=$REGION \
  --no-allow-unauthenticated \
  --memory=512Mi \
  --cpu=1 \
  --concurrency=80 \
  --timeout=120 \
  --service-account="infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com"

Cloud Run gives you a URL like https://infra-health-mcp-<hash>-uc.a.run.app. Grab it:

export MCP_URL=$(gcloud run services describe infra-health-mcp \
  --region=$REGION \
  --format='value(status.url)')

Step 6: Test It Locally via the Cloud Run Proxy

Don't expose your MCP server to the internet directly. Use the proxy to test with your local credentials:

gcloud run services proxy infra-health-mcp --region=$REGION --port=3000

Now hit it at http://localhost:3000 — your credentials are injected automatically, no token management needed.

Part 3 — Wire Both Servers into an ADK Agent

Now the fun part. We'll build an agent that uses both MCP servers — the built-in Cloud Run one and your custom infra health server — to answer infrastructure questions like a seasoned SRE.

Install ADK

pip install google-adk

Create the Agent

Create agent.py:

import asyncio
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StreamableHTTPConnectionParams
import os

MCP_URL = os.environ["MCP_URL"]  # your infra-health-mcp URL


async def main():
    # Connect to both MCP servers
    cloud_run_tools = MCPToolset(
        StreamableHTTPConnectionParams(url="https://run.googleapis.com/mcp")
    )
    infra_health_tools = MCPToolset(
        StreamableHTTPConnectionParams(url=MCP_URL)
    )

    agent = LlmAgent(
        name="infra-agent",
        model="gemini-2.0-flash",
        instruction=(
            "You are an infrastructure operations assistant. "
            "You have access to Cloud Run management tools and infrastructure health tools. "
            "When asked about service health or errors, always check both the service status "
            "and recent error rates before answering. Be concise and actionable."
        ),
        tools=[cloud_run_tools, infra_health_tools],
    )

    # Example queries — swap these for interactive input
    queries = [
        "List all my Cloud Run services in us-central1",
        "Which services had 5xx errors in the last hour?",
        "Give me a health summary for all services",
    ]

    for query in queries:
        print(f"\n>>> {query}")
        response = await agent.run(query)
        print(response.text)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

MCP_URL=$MCP_URL python agent.py

You'll see the agent autonomously call list_services, then loop over each service calling get_service_error_rate and check_service_health — building a full infra health picture without you writing a single orchestration loop.

This is the moment it clicks. You didn't write "for service in services: check health". The agent reasoned its way to that pattern. Your job was defining the tools. That's a genuine shift in how we build infra tooling.

Security: Don't Skip This Section

Agents with infrastructure permissions need real guardrails. Here's what I'd put in place before letting this near production:

1. Scope service account permissions tightly. The infra-health-sa has read-only roles. If you want an agent that can also deploy, create a separate service account for write operations and require explicit approval flows before those tools fire.

2. Use IAM deny policies for the write MCP tools. You can explicitly deny run.services.create on specific service accounts at the project level — useful if you only want agents to have deploy access in staging, not prod.

3. Enable Model Armor. Google's Model Armor sits in front of MCP calls and blocks prompt injection attempts, malicious URIs, and unsafe content before they reach your tools. Enable it in the Google Cloud console under AI Safety.

4. Cloud Audit Logs are your friend. Every MCP tool call made through Google-managed servers is logged automatically. Set up a log-based alert for any deploy_service_from_image calls from service accounts that shouldn't be deploying.

# Example: alert on unexpected deploys
gcloud logging metrics create unexpected-agent-deploy \
  --description="MCP deploy calls from unexpected accounts" \
  --log-filter='protoPayload.methodName="google.cloud.run.v2.Services.CreateService"'

My Honest Take

What impressed me most at NEXT '26 isn't any single feature — it's that Google is treating MCP as a first-class citizen across the entire platform. BigQuery has a managed MCP server. Cloud Logging has one. Cloud SQL is getting one. This is becoming the standard interface layer between AI agents and cloud services.

For IT specialists and infrastructure engineers, this is actually exciting rather than threatening. The tedious parts of infra ops — writing one-off scripts to list resources, cross-referencing logs with deployment times, checking health across 20 services — are exactly what agents are good at. You shift from doing the repetitive tasks to designing the tools that do them.

The rough edges? Auth setup for remote MCP servers is still fiddly, especially in multi-project setups. The ADK toolset documentation is still catching up to the pace of announcements. And "fully managed" doesn't yet mean "zero config" — you still need to wire up IAM carefully.

But the direction is clear, and the foundation is solid. The infra engineer who learns to build good MCP servers is going to be unreasonably productive over the next few years.

What's Next

Explore the official Cloud Run MCP docs
Check out the ADK MCP codelab
Try adding a deploy_service tool to your custom server — and see how differently it feels when an agent handles rollback logic

Drop a comment if you build something cool with this. I'm especially curious what domain-specific MCP servers people come up with.

Built and tested as part of the Google Cloud NEXT '26 Writing Challenge. All code examples use placeholder project IDs — swap in your own before running.

From Prompts to Action: My Journey Through the Google & Kaggle AI Agents Bootcamp

joshyfruit — Mon, 15 Dec 2025 02:03:16 +0000

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections

From Prompts to Action: My Journey Through the Google & Kaggle AI Agents Bootcamp

As someone who has watched AI evolve from "magic black box" to "everyday tool," I often felt a barrier between using AI and building with it. Aside from a chatbot what else could AI do and how do I harness the power of AI? I thought building agents required a PhD in Machine Learning. This week, the 5-Day AI Agents Intensive Course by Google and Kaggle completely shattered that illusion.

It turns out, if you can write a Python function, you can build an agent. Here is my deep dive into the code, the concepts, and the tools that made this journey accessible, featuring my capstone project: Jarbest.

The Awakening: Hello, Agent

As coming from non-developer background, I always imagined software as a "bricklayer"—rigidly following a blueprint. Day 1 introduced me to the Agent: a system that acts more like a film director. It doesn't just predict text; it has a "Brain" (the model), "Hands" (tools), and a "Nervous System" (orchestration) to autonomously perceive, reason, and act. We learned that an agent operates in a continuous loop—Mission, Scan, Think, Act, Observe—constantly adapting its plan to solve problems. This framework demystified the magic: I wasn't just coding a chatbot; I was building a system with the agency to execute multi-step missions.

The "Aha!" Moment: It's Just Python & The "USB Port" for AI

Day 2 was a revelation. I learned that Models are just "Brains"—pattern predictors that cannot see or act. To be useful, they need Tools: the "Eyes" and "Hands" that let them fetch data or execute actions.

But connecting every tool to every model is a nightmare (the N x M problem). Enter the Model Context Protocol (MCP).

Think of MCP as the USB port for AI. Before USB, you needed a specific cable for every device. MCP lets you plug any tool into any agent using a standard connection.

The Code: Giving the Agent "Hands"

In my project, Jarbest (an accessible personal companion), I needed an agent that could check bank account balances. instead of writing a custom connector, I used MCP to "plug in" a secure banking server.

# Finance Agent: Manages the banks transactions
finance_agent = Agent(
    name="finance_agent",
    description="An agent that can help with banking operations like checking balances...",
    # Assuming; This toolset connects to a secure internal banking server
    tools=[
        MCPToolset(
            connection_params=StreamableHTTPConnectionParams(
                url=f"{BANK_MCP_URL.rstrip('/')}/mcp",
            )
        )
    ],
)

Why this matters (and the danger):
The agent reads these tool definitions and knows exactly when to use them. If a user asks "Can I afford this pizza?", the agent inherently knows it must first call check_balance.

However, the notes warned us: Using MCP is like plugging in a random USB drive found on the street. It could be a legitimate tool, or it could be a "Tool Shadow" (a malicious copy). That's why in Jarbest, I implemented a strict Application-Layer Gateway (via hardcoded allowlists)—ensuring the agent can only connect to my specific, internal MCP banking server, preventing it from ever "plugging in" to an untrusted source

Deep Dive: The Brain (Memory)

Day 3 was where things got sophisticated. A chatbot forgets you the moment you close the tab. An agent remembers.

For Jarbest, which is designed for elderly users who value consistency, memory is critical. If "Grandma Jane" asks for her "usual order," the agent shouldn't ask "What is that?"; it should know.

Here is how I implemented the "Brain" in my root agent:

root_agent = Agent(
    name='root_agent',
    instruction="""
    You are Jarbest...
    Memory: Use the load_memory tool to recall past conversations and preferences 
    (e.g., "ordering the usual").
    """,
    tools=[load_memory], # <--- This single line gives the agent a "brain"
    after_agent_callback=auto_save_to_memory # Auto-saves every interaction
)

The Non-Developer Perspective: Think of load_memory like giving the agent a filing cabinet. When Grandma Jane says "Order me some food," the agent thinks: "I need to check if she has a preference," opens the cabinet (load_memory), finds "Likes Large Pepperoni Pizza," and acts on it. Watching this thought process in real-time was mind-blowing.

The "Squeeze": Debugging the Black Box

Day 4 taught us that "it works" isn't enough. You need to know why it works. When building a safety-focused agent like Jarbest, I couldn't afford "hallucinations."

Exploring the Agent Observability labs, I learned to trace the agent's reasoning steps. When my agent refused to order a pizza, I could look at the trace and see:

User: "Order a pizza."
Tool Call: check_balance -> returned $5.00.
Reasoning: "Pizza costs $20. User has $5. Result: Unsafe."
Response: "I cannot complete this order because your balance is too low."

Seeing that raw reasoning log felt like looking into the matrix. It transformed the LLM from a mysterious oracle into a logical, debuggable software component. I realized I wasn't just "prompting" anymore; I was engineering logic.

The Ecosystem: Agents Talking to Agents (A2A)

Day 5 introduced the Agent-to-Agent (A2A) Protocol. This is where I moved from building a single assistant to building a team.

My "Purchaser Agent" doesn't know how to make pizza. Instead, it connects to a completely separate "Pizza Shop Agent" (simulating a 3rd party vendor).

# Creating a client-side proxy for a remote agent
pizza_agent_proxy = RemoteA2aAgent(
    name="pizza_agent",
    # The "Agent Card" acts like a business card for discovery
    agent_card="http://localhost:10000/.well-known/agent-card.json",
    description="Remote pizza agent from external vendor...",
)

purchaser_agent = Agent(
    name="purchaser_agent",
    instruction="Your goal is to help the user find and buy items.",
    tools=[AgentTool(pizza_agent_proxy)], # <--- Treating another agent as a tool
)

The Cool Idea: The "Agent Card" isn't just a technical manifest; it's a completely new way for businesses to interact.

For SMBs (Small to Medium Businesses): Instead of constantly maintaining and documenting complex APIs for developers to read, you simply publish an "Agent Card" (like a digital business card) that describes what your service does (e.g., "I sell pepperoni pizza").
For Developers: It saves massive amounts of time. My support agent just reads this card and instantly knows how to ask for a customized order.
The Future: This allows agents to communicate autonomously, representing the transaction of each individual without human friction. It’s like an API that reads itself.

Capstone Spotlight: Jarbest - Agents for Good

Applying these concepts, I built Jarbest for the "Agents for Good" track.

The Problem: The digital world is full of dark patterns and complex UIs that exploit vulnerable users, especially the elderly.
The Solution: A unified "Action Space" that replaces app sprawl.
Jarbest eliminates the friction of switching between banking apps, delivery apps, and websites. Instead of forcing Grandma Jane to install and navigate ten different confusing interfaces, Jarbest uses Tools, MCP, and A2A to communicate with these services directly on her behalf.

The Result: Simplicity and Safety. By centralizing these actions into one verified conversation, we inherently protect the user. They no longer need to open browsers or install random apps where they might fall victim to phishing sites or fake download buttons. Jarbest acts as the safe, validated operational layer for their digital life.

Jarbest uses a Hierarchical Architecture:

Guardian (Root Agent): The "Thinking" layer. It never touches money directly. It validates safety.
Auditor (Finance Agent): The only agent with access to the MCP banking server (via MCP).
Doer (Purchaser Agent): The logistics layer that talks to vendors (via A2A).

This separation of concerns ensures that even if the "Doer" gets confused, the "Guardian" prevents any financial mistakes.

Check out the Code Repository Here

WATCH THIS VIDEO 👆👆👆

Why Multi-Agent Architecture?

Working with a single "god agent" that handles everything creates a bottleneck. It forces one model to juggle complex reasoning (safety checks, intent parsing) with mundane execution (API calls, order formatting), leading to context overflow and hallucinations.

By breaking the system into specialized sub-agents, we achieve:

Reduced Cognitive Load: The Root Agent focuses purely on orchestration and safety, while the Purchaser Agent focuses solely on logistics.
Efficiency: We can route simple tasks to faster, cheaper models (Gemini 2.5 Flash) and reserve the powerful reasoning models (Gemini 3 Pro) for the Guardian role.
Scalability: New vendors (e.g., a Pharmacy Agent) can be added as new tools for the Purchaser Agent without retraining or complicating the Root Agent's logic.

My Secret Weapon: NotebookLM

The course came with dense whitepapers—goldmines of information on "Context Engineering" and "Agent Quality." But digesting 20-page PDFs can be daunting.

My Workflow:

Feed the Brain: I downloaded the Context Engineering whitepaper and uploaded it directly to NotebookLM.
The Conversation: Instead of reading linearly, I interrogated the text.
- Me: "Explain the trade-offs between vector databases and keyword search for agent memory."
- NotebookLM: It synthesized the answer specifically from the whitepaper, citing the exact page numbers.
The Podcast: I used the "Audio Overview" feature to generate a podcast of the whitepaper. I listened to two AI hosts debate the merits of "Session" vs. "Memory" while I cooked dinner. It turned homework into entertainment.

What I'd Do Differently (The Roadmap)

Building Jarbest in just 5 days was a sprint, and I left plenty of ideas on the cutting room floor. If I had another week, here is what I would tackle:

Dynamic Tool Loading: Instead of hardcoding tools, I'd want the agent to "discover" new MCP servers on the local network automatically.
Voice Interface: Accessibility is key for my target audience (elderly users). Adding a voice layer on top of the text interface would be a game-changer.
Proactive Alerts: Currently, the agent waits for input. I want to build a background loop where it can nudge the user: "Hey, you usually order groceries on Tuesday. Should I do that?"

Conclusion

This bootcamp didn’t just teach me syntax; it fundamentally shifted my mental model of software development. I went from viewing AI as a passive chatbot to seeing it as a dynamic, composable ecosystem of "Doers".

The combination of accessible frameworks like the Google GenAI SDK, standardized protocols like MCP, and powerful reasoning models has truly democratized agency. You don't need a research lab or a PhD to build systems that perceive, reason, and act—you just need a clear mission and the curiosity to prompt it.

If you've been on the fence about diving into AI Agents, now is the time to start. The tools are ready and the barrier to entry has never been lower. I can't wait to see what you build.

Let's Deploy n8n on ec2 instances 🚀🚀🚀

joshyfruit — Wed, 10 Dec 2025 13:05:04 +0000

I really love automating

I’ve always loved automating things in my workflows.

When I stumbled on n8n, I was honestly in awe. Suddenly all the ideas I had before—like updating Google Sheets, posting to social media, hosting my own APIs, syncing Google Drive to an S3 bucket, and a lot more—became realistic without needing to learn a new framework or library for every single task.

n8n makes it much easier to build small automations that actually ship and help you in day‑to‑day work, and that’s what made me fall in love with it.

So we’ll walk through how to deploy n8n on an AWS EC2 instance so you can start running your own automations on your own infrastructure.

Launching the Instance 🚀

Step 1: Launch an Instance

Log in to your AWS Management Console.
Navigate to the EC2 Dashboard.
Click the Launch instance button.

Step 2: Name and OS Selection

Name: Give your instance a recognizable name, such as n8n.
Application and OS Images (AMI): Select Amazon Linux. The default "Amazon Linux 2023 AMI" is a great choice and is eligible for the Free Tier.
Architecture: Select 64-bit (ARM) for better cost efficiency on AWS (Graviton processors offer better performance-per-dollar).

Step 3: Choose an Instance Type

Select an instance type that suits your needs. For this tutorial, we are using t4g.medium (ARM-based Graviton). Here's why t4g.medium is the most cost-efficient choice for n8n:

Better Performance-per-Dollar: AWS Graviton2 processors offer 40% better price-to-performance than comparable x86 instances
Sufficient Resources: 2 vCPUs and 4GB RAM handle n8n with multiple workers smoothly without over-provisioning
Burstable Performance: T4g instances include CPU credits, allowing you to handle traffic spikes without constant high usage
Lower Costs: t4g.medium costs ~40% less than equivalent t3.medium with superior performance
Production-Ready: Unlike t4g.micro (limited burstable capacity), t4g.medium can sustain moderate workflows continuously

For comparison:

t4g.micro: Limited for production (Free Tier only, ~1 vCPU, 1GB RAM)
t4g.medium: Recommended for small-to-medium deployments (2 vCPUs, 4GB RAM)
t4g.large: For high-volume workflows (2 vCPUs, 8GB RAM)

Step 4: Key Pair

Under Key pair (login), select an existing key pair from the dropdown menu to ensure you can SSH into your server later. If you don't have one, click "Create new key pair".

Step 5: Network Settings (Security Groups)

This is a crucial step to ensure your instance is accessible.

Under Network settings, choose Create security group.
Ensure the following rules are checked/added:

Rule Type	Port	Protocol	Source	Purpose
SSH	22	TCP	Your IP (0.0.0.0/0 for testing only)	Remote terminal access
HTTP	80	TCP	0.0.0.0/0	Web traffic (redirect to HTTPS)
HTTPS	443	TCP	0.0.0.0/0	Secure web traffic for n8n UI

Step 6: Configure Storage

The default 8 GiB of gp3 storage is usually sufficient for a basic installation. You can leave this as is.

Step 7: Advanced Details - User Data (The Installation Script)

This is the most important part of the automation. Instead of manually installing software after the server boots, we will provide a script to do it automatically.

Scroll down to the Advanced details section.
Scroll to the very bottom to find the User data text field.
Paste the following script. This script updates the system, installs Docker, sets up Docker Compose, and configures permissions.

#!/bin/bash
yum install -y git docker

# Install Docker Compose plugin (system-wide)
mkdir -p /usr/local/lib/docker/cli-plugins
curl -SL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" \
  -o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose

# Enable and start Docker
systemctl enable docker
systemctl start docker

# Allow ec2-user to run docker without sudo
usermod -aG docker ec2-user

Highlight: The script above automates the installation of Git, Docker, and Docker Compose, saving you several manual command-line steps later.

Step 8: Launch

Review your settings in the Summary panel on the right.
Click Launch instance.
Wait for the success message, then click the Instance ID to view your running server.

Connecting and Configuring 🔌

Now that your instance is running, we need to connect to it and download the necessary n8n configuration files.

Step 1: Connect to your Instance

Select your running instance from the list.
Click the Connect button at the top right of the console.
Select the EC2 Instance Connect tab.
Leave the default username as ec2-user.
Click the orange Connect button. A new browser window will open with a terminal interface.

Step 2: Verify Installation

Once the terminal loads, verify that the installation script from Part 1 ran successfully by checking the Docker Compose version:

docker compose --version

If this command returns a version number, your environment is ready.

Step 3: Download the n8n Setup

We will use a pre-configured setup from GitHub to get n8n running quickly.

Clone the repository:

git clone https://github.com/coozgan/hosting-n8n-aws.git

Navigate into the project directory:
Navigate into the project directory:
```
cd hosting-n8n-aws
```

Deploying n8n with Workers 🐳

Now we'll set up a production-ready n8n deployment with distributed workers, Redis queuing, PostgreSQL persistence, and Caddy reverse proxy for SSL/TLS.

Step 1: Review Configuration

Review the architecture. This setup includes:flow triggers
- n8n Workers: Execute workflows asynchronously from a job queue
- Redis: Manages job distribution and retries
- PostgreSQL: Stores workflows, executions, and user data
- Caddy: Reverse proxy with automatic SSL/TLS certificates

Step 2: Configure Environment Variables

Create and edit the .env file:
```
cp .env-example .env
nano .env
```
a. Create a DuckDNS subdomain
Go to https://www.duckdns.org and sign in with GitHub, Google, etc.

b. On the main page, choose a subdomain name (for example, myn8n) and click add domain.

c. Find your ip address head back to instance connect and type the command, then copy the IP Address;
```
curl ifconfig.me
```
d. paste the IP Address duckdns.org website and add the IP Address.

Add the following essential variables:

# Domain Configuration
DOMAIN=your-domain.com #use localhost if you don't have domain

# n8n Encryption (generate a strong random key)
N8N_ENCRYPTION_KEY= CHOOSE_YOUR_OWN_KEY

# PostgreSQL Database
POSTGRES_PASSWORD=CHOOSE_YOUR_OWN_PASSWORD

# Timezone
GENERIC_TIMEZONE=America/New_York

Press Ctrl+X to save, then Y and Enter to exit nano.

Important: Keep your N8N_ENCRYPTION_KEY and POSTGRES_PASSWORD safe. If you lose them, you won't be able to recover your workflows or data.

Step 3: Start the Services

Press Ctrl+X to exit, then press Y to confirm save, and Enter to finalize.

Important: Keep your N8N_ENCRYPTION_KEY and POSTGRES_PASSWORD safe. If you lose them, you won't be able to recover your workflows or data.

Step 3: Start the Services

Verify all containers are running:
```
docker compose ps
```
Check the logs to ensure everything started correctly:
```
docker compose logs -f
```

Step 4: Access Your n8n Instance

Once services are healthy, navigate to your domain in a web browser:
- If you configured a domain: https://your-domain.com
- If testing without domain: http://<instance_public_ipaddress>
You should see the n8n login screen. Create your first user account.
Congratulations! You have yourself a n8n single‑instance.

Architecture Overview

Scaling and Best Practices ⚡

Scaling Horizontally (within the instance)

To add more worker instances, update your docker-compose.yml:

n8n-worker-2:
  extends: n8n-worker
  container_name: n8n-worker-2
... #copy the rest of the code

n8n-worker-3:
  extends: n8n-worker
  container_name: n8n-worker-3
... #copy the rest of the code

Then restart:

docker compose up -d

For more information follow this... 👈

Performance Optimization Tips

Adjust Worker Count: Start with 2-3 workers and monitor CPU/memory usage.

docker stats

Database Tuning: Prune old execution logs to keep PostgreSQL performant:
- The docker-compose includes EXECUTIONS_DATA_PRUNE=true
- It keeps 7 days of history by default (EXECUTIONS_DATA_MAX_AGE=168)
Redis Configuration: The Redis instance has a 512MB memory limit with LRU eviction policy.
- Monitor with: docker exec redis redis-cli info stats

Security Best Practices

Enable User Management:
- The setup has N8N_USER_MANAGEMENT_DISABLED=false by default
- Create separate user accounts for team members
Restrict SSH Access:
- Update your security group to only allow SSH from your IP
- In AWS Console: Security Groups > Inbound Rules > Edit SSH rule

Backup Strategy:

Regularly backup your PostgreSQL database:

docker exec postgres pg_dump -U postgres postgres > backup.sql

Backup your n8n data volume:

docker run --rm -v n8n-with-workers_n8n-data:/data \ 
   -v $(pwd):/backup alpine tar czf /backup/n8n-backup.tar.gz \
   -C /data .

SSL/TLS Certificates:
- Caddy automatically obtains and renews Let's Encrypt certificates
- Certificates are stored in the caddy-data volume

Troubleshooting

Issue: Cannot Connect to n8n solution

1️⃣ Check if all services are running:

docker compose ps

2️⃣ Check Caddy logs for SSL certificate issues:

docker compose logs caddy

3️⃣ Ensure your security group allows HTTP (port 80) and HTTPS (port 443)

Issue: Workers Not Processing Jobs Solution

1️⃣ Verify Redis is healthy:

docker exec redis redis-cli ping

2️⃣ Check worker logs:

docker compose logs n8n-worker

3️⃣ Ensure EXECUTIONS_MODE=queue is set in your environment

Issue: Database Connection Errors Solution

1️⃣ Verify PostgreSQL is running and healthy:

docker compose logs postgres

2️⃣ Check database credentials match in .env:

docker exec postgres psql -U postgres -d postgres -c "\dt"

3️⃣ Ensure POSTGRES_PASSWORD in your .env is correct

Issue:Out of Disk Space Solution

1️⃣ Check disk usage:

docker system df

2️⃣ Prune old data:

docker system prune -a
docker volume prune

3️⃣ Clean up old execution logs in n8n UI: Settings → Executions → Delete old executions

Monitoring and Maintenance

Monitor Container Health

# Real-time resource usage
docker stats

# View service logs
docker compose logs -f [service-name]

# Check specific service health
docker compose ps

Regular Maintenance Tasks

Weekly:

Monitor disk usage: df -h
Check error logs: docker compose logs --since 1w | grep -i error

Monthly:

Backup databases and volumes
Review user access and remove inactive accounts
Update Docker images: docker compose pull && docker compose up -d

Quarterly:

Security audit of your workflows
Review and optimize your AWS security groups
Test disaster recovery (restore from backup)

Cost Optimization on AWS

Instance Type: t4g.medium provides the best balance of cost and performance for n8n deployments
- ARM-based Graviton2 processors (40% better price-to-performance than x86)
- Sufficient resources for multiple workers without over-provisioning
- Monthly cost: ~$9-12 (vs ~$15-18 for equivalent x86 instances)
Data Transfer: Minimize data egress by keeping compute and data in the same region
RDS Alternative: For high-volume deployments, consider AWS RDS for PostgreSQL instead of self-hosted
CloudWatch: Set up alarms for unusual activity
Auto-Scaling: Use EC2 Auto Scaling Groups for multiple instances in production

Next Steps:make it production-ready

This guide focuses on getting n8n running quickly on EC2 so you can start experimenting. For a more production-ready setup, you’ll likely want to:

Use serverless deployment (ECS Fargate, Elasticache and RDS)
Use a custom domain with Route 53 or another DNS provider
Use Elastic Load Balancer and Automatic Scaling Group on ECS Clusters
Configure environment variables for n8n (like WEBHOOK_URL, credentials, etc.)

Those topics can be a follow-up post in this series, where the EC2 instance you just created becomes the base for a more secure, robust n8n deployment.

Conclusion 🎉

You now have a production-grade n8n automation platform running on AWS with:

✅ Cost effective to small to medium automations
✅ Scalable worker architecture on ARM-based Graviton processors
✅ Persistent data storage with PostgreSQL
✅ Automatic SSL/TLS encryption via Caddy
✅ Job queue system with Redis for reliable executions
✅ Easy horizontal scaling within your instance

Host your first AI Agent on Agentcore

joshyfruit — Tue, 11 Nov 2025 04:04:06 +0000

Exploring AgentCore: Effortless AI Agent Deployment with AWS Strands SDK

I had a good time exploring AgentCore and I see what AWS is trying to do: eliminate the second guessing on how to deploy your AI Agent. With the release of the Strands SDK, AWS has introduced a truly streamlined way to get agents running quickly. In this post, I’ll walk you through how I hosted my own agent on AgentCore, with everything from setup to cloud deployment.

Why AgentCore and Strands SDK?

AWS AgentCore Runtime solves a major pain-point for developers: secure, reliable, and scalable deployment of AI agents without wrangling cloud infrastructure details. The new Strands SDK tightly integrates with AgentCore, letting you focus on your agent logic while AWS handles scaling, session isolation, and production readiness.

Step-by-Step: Deploying Your Agent

Prerequisites:

AWS account and CLI configured (aws configure)
Python 3.10+, pip, Docker
Install the AgentCore toolkit and Strands SDK:

pip install bedrock-agentcore bedrock-agentcore-starter-toolkit strands-agents

Sample Agent (my_agent.py):

from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent

app = BedrockAgentCoreApp()
agent = Agent()

@app.entrypoint
def invoke(payload):
    user_message = payload.get("prompt", "Hello! How can I help you today?")
    result = agent(user_message)
    return {"result": result.message}

if __name__ == "__main__":
    app.run()

Local Test (optional):

python my_agent.py
# In another terminal:
curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello!"}'

Configure for Cloud Deployment:

agentcore configure --entrypoint my_agent.py

Follow prompts for runtime options.

Deploy to AgentCore Runtime:

agentcore launch

You’ll get an Agent ARN for your deployed agent.

Invoke the Deployed Agent:

agentcore invoke '{"prompt": "Tell me a joke"}'

Architecture Diagram

Here’s a quick summary of the workflow:

Note: Local testing uses HTTP. Deployment and cloud invocation use AWS endpoints over HTTPS for secure communication.

My Takeaways

AgentCore Runtime and the Strands SDK make deploying AI agents on AWS simple and production-ready. From local prototyping to scalable cloud hosting, you skip the usual cloud headaches and focus on building features.

Let me know if you want a deeper dive into custom agent logic, more advanced deployment options, or troubleshooting advice!

HealthBuddy SG: AI-Powered Wellness

joshyfruit — Mon, 15 Sep 2025 06:22:24 +0000

HealthBuddy SG: AI-Powered Wellness for Singapore’s Tropics

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

HealthBuddy SG is a next-generation health and wellness applet designed specifically for Singaporeans and anyone living in a humid, tropical climate. It empowers users to:

Check their health with natural voice interaction (symptom analyzer and recommendations)
Scan and analyze local food for nutrition and climate-adapted advice
Track personal progress and receive daily, hyper-local, climate-aware guidance

My goal: bring personalized, practical AI health support to the palm of every Singapore resident, helping users thrive amidst heat, humidity, and busy urban life.

Demo

🔗 Live Applet: https://healthbuddy-sg-ver1-192629822894.us-west1.run.app/

Screenshots:

Demo:
🎥 Click here to watch HealthBuddy SG’s main features in action

How I Used Google AI Studio

Google AI Studio was the backbone for:

Rapidly building and refining prompts for symptom descriptions (English/Singapore-style)
Testing and iterating on voice/audio input, food image recognition, and text-based queries
Using the “Get code” feature to scaffold web endpoints and integrate Gemini in my workflow

Gemini’s multimodal APIs made it possible to interconnect speech, text, and image understanding, enabling truly natural, cross-modal user experiences.

Cloud Run allowed frictionless, scalable deployment, making the applet ready for public use and further enhancements.

Multimodal Features

HealthBuddy SG leverages the following Google AI multimodal capabilities:

Voice Symptom Analysis
- Users describe how they feel by speaking naturally; Gemini listens, analyzes, and provides health recommendations with Singapore context: “Is it haze related?”, “Possible dengue?”, “Stay hydrated advice”, etc.
Food Scanner for Local Cuisine
- Upload food photos; Gemini identifies local dishes (laksa, roti prata, etc.), assesses their nutrition, and gives “heat/hydration” tips specific to Singapore’s climate.
Personalized Progress Dashboard
- Visualizes health data, voice check-ins, and food analysis over time with clear cards and charts inspired by leading health apps.
- Delivers actionable, climate-adapted suggestions to boost user wellbeing.
Singapore-Specific Health Guidance
- Every result is enhanced with real tips for local concerns (haze, dengue, humidity, heatstroke).
- Smart recommendations for when to seek help from nearby clinics, polyclinics, or pharmacies.
Professional, Responsive UI
- Consistent card-based design, friendly mascot, easily navigable for mobile or desktop, ensuring a joyful user journey.

Why This Matters

Singaporeans face unique health challenges—tropical illnesses, heat, and rapid urban pace. HealthBuddy SG uses state-of-the-art Google AI multimodal technology to provide support that’s context-aware and actionable, helping every user make smarter daily wellness choices.

Solo submission by Joshua Cymon Gomez (@joshyfruit ), using open source and original code/UX—starter voice assistant, new modules, pro-level UI, and code structured for future growth.

devchallenge #googleaichallenge #ai #gemini #singapore #healthtech