Forem: Rue Matchaba

How I Cut My Cloud SQL Bill by 70% After Discovering 0.8% CPU Utilization

Rue Matchaba — Thu, 30 Apr 2026 00:06:38 +0000

The Cloud SQL Bill That Taught Me Everything About Over-Provisioning

My database was running at 0.8% CPU utilisation.

I discovered this three months after going live, while investigating why our GCP bill seemed higher than expected for our traffic volume. The number was so low I thought there was an error in Cloud Monitoring. There wasn't.

I'd been paying for a machine that could handle roughly 100x more load than we were actually putting on it. Classic over-provisioning, but seeing it in real numbers was genuinely embarrassing.

Here's everything I learned about right-sizing Cloud SQL instances, with the specific metrics and commands that will save you from making the same mistakes.

The Real Cost of "Playing It Safe"

When you're spinning up your first production Cloud SQL instance, the console gives you a dropdown of machine types: db-f1-micro, db-n1-standard-1, db-n1-standard-2, and so on. The descriptions are helpful but vague: "1 vCPU, 3.75GB memory" tells you the specs, not whether you need them.

I picked db-n1-standard-2 because it seemed reasonable for a production database. Not too small, not excessive. The middle option. That decision was based on absolutely no data.

The problem with "reasonable" is that it's usually wrong. Either you're under-provisioned and your app breaks, or you're over-provisioned and you're burning money. In my case, it was the latter.

What the Metrics Actually Tell You

The key insight is that Cloud Monitoring shows you exactly what your database is doing. You just have to know where to look.

CPU Utilisation

This is the most important metric for right-sizing your instance.

Where to find it: Cloud Console → SQL → your instance → Monitoring tab → CPU utilization

What to look for:

Average utilisation over the past 30 days
P95 and P99 peaks (the highest 5% and 1% of usage)
Time of day patterns

How to interpret it:

Under 20% average: you can probably downgrade
20-50%: you're sized appropriately
50-80%: keep an eye on growth trends
Over 80% sustained: consider upgrading

My average was 0.8%. My P99 was around 3%. I could have run the same workload on a db-f1-micro instance and saved roughly 70% on compute costs.

Memory Utilisation

Where to find it: Same monitoring tab → Memory utilization

What matters: You want to see consistent memory usage without swap. If memory utilisation is consistently above 90% or you're seeing any swap usage, that's a performance problem waiting to happen.

What I found: Memory usage was sitting around 15% with zero swap. Another sign I was massively over-provisioned.

Connection Count

Where to find it: Monitoring tab → Database connections

What to look for: Peak active connections compared to your instance's connection limit.

Connection limits by instance:

db-f1-micro: 25 connections
db-n1-standard-1: 100 connections
db-n1-standard-2: 200 connections

My peak connections were hitting around 11. Even a db-f1-micro would have been comfortable.

The Commands That Actually Matter

Once you know your utilisation is low, here are the specific commands to check what you're currently running and how to change it.

Check Your Current Instance Configuration

gcloud sql instances describe YOUR_INSTANCE_NAME --format="table(
    name,
    settings.tier,
    settings.dataDiskSizeGb,
    settings.availabilityType,
    settings.backupConfiguration.enabled
)"

This gives you a clean summary of what you're paying for:

settings.tier: your machine type (the expensive part)
settings.dataDiskSizeGb: disk size
settings.availabilityType: whether HA is enabled
settings.backupConfiguration.enabled: backup settings

Downgrade Your Instance Tier

If your CPU utilisation is consistently low, this is the biggest cost saving:

gcloud sql instances patch YOUR_INSTANCE_NAME --tier=db-f1-micro

Important: This will restart your instance. Plan for a few minutes of downtime.

Machine type costs (rough monthly estimates for PostgreSQL in us-central1):

db-f1-micro: ~$7/month
db-n1-standard-1: ~$25/month
db-n1-standard-2: ~$50/month
db-n1-standard-4: ~$100/month

Moving from standard-2 to f1-micro saves around $43/month per instance. That adds up fast if you're running multiple environments.

Turn Off High Availability (Where Appropriate)

High Availability runs a standby replica in a different zone, roughly doubling your instance cost. You want this in production. You probably don't need it in staging or development.

Check if HA is enabled:

gcloud sql instances describe YOUR_INSTANCE_NAME --format="value(settings.availabilityType)"

Turn it off:

gcloud sql instances patch YOUR_INSTANCE_NAME --availability-type=ZONAL

Turn it back on:

gcloud sql instances patch YOUR_INSTANCE_NAME --availability-type=REGIONAL

This change also requires a restart, so plan accordingly.

The Storage Problem You Can't Fix Easily

Here's the frustrating part: Cloud SQL storage auto-increases but never auto-decreases. If your data grows to 50GB and then you delete 40GB, you're still paying for 50GB forever.

I had 100GB provisioned and was using 240MB. That's 0.24% utilisation. Storage isn't the most expensive part of Cloud SQL, but it's still $10/month I didn't need to spend.

Check your actual storage usage:

gcloud sql instances describe YOUR_INSTANCE_NAME --format="value(settings.dataDiskSizeGb)"

Then connect to your database and check actual usage:

-- For PostgreSQL
SELECT pg_size_pretty(pg_database_size('your_database_name'));

-- For MySQL
SELECT 
    table_schema AS "Database",
    ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)"
FROM information_schema.tables
GROUP BY table_schema;

The only fix for oversized storage: export your data, delete the instance, and recreate it with a smaller disk. This is disruptive enough that you probably won't do it unless the over-provisioning is severe.

The lesson: Size your initial disk conservatively. 10GB is the minimum and sufficient for most applications starting out. You can always increase it later without downtime.

Backup Configuration That Actually Makes Sense

Cloud SQL defaults to 7 days of automated backup retention. For production, that makes sense. For staging environments that get refreshed weekly, you're paying to store backups of data you'd never restore.

Check your backup settings:

gcloud sql instances describe YOUR_INSTANCE_NAME --format="table(
    settings.backupConfiguration.enabled,
    settings.backupConfiguration.retainedBackups,
    settings.backupConfiguration.pointInTimeRecoveryEnabled
)"

Reduce backup retention for non-critical instances:

gcloud sql instances patch YOUR_INSTANCE_NAME --backup-retain-count=3

Turn off point-in-time recovery (PITR) for non-critical instances:

gcloud sql instances patch YOUR_INSTANCE_NAME --no-backup-point-in-time-recovery

PITR keeps transaction logs to allow recovery to any specific timestamp. It's useful for production but adds storage costs and complexity for environments where you'd just restore from the most recent daily backup anyway.

The Monitoring Dashboard You Should Actually Use

Instead of checking individual metrics manually, set up a custom dashboard that shows everything relevant at once.

Create a monitoring workspace (if you don't have one):

gcloud alpha monitoring dashboards create --config-from-file=dashboard-config.yaml

Dashboard configuration (dashboard-config.yaml):

displayName: "Cloud SQL Cost Optimization"
mosaicLayout:
  tiles:
  - width: 6
    height: 4
    widget:
      title: "CPU Utilization"
      xyChart:
        dataSets:
        - timeSeriesQuery:
            timeSeriesFilter:
              filter: 'resource.type="cloudsql_database"'
              metricFilter:
                filter: 'metric.type="cloudsql.googleapis.com/database/cpu/utilization"'
  - width: 6
    height: 4
    widget:
      title: "Memory Utilization"
      xyChart:
        dataSets:
        - timeSeriesQuery:
            timeSeriesFilter:
              filter: 'resource.type="cloudsql_database"'
              metricFilter:
                filter: 'metric.type="cloudsql.googleapis.com/database/memory/utilization"'
  - width: 6
    height: 4
    widget:
      title: "Active Connections"
      xyChart:
        dataSets:
        - timeSeriesQuery:
            timeSeriesFilter:
              filter: 'resource.type="cloudsql_database"'
              metricFilter:
                filter: 'metric.type="cloudsql.googleapis.com/database/postgresql/num_backends"'

This gives you a single view of the three metrics that matter most for cost optimization.

What I Wish I'd Known Before Clicking "Create"

The real lesson here isn't about any specific setting. It's about the mindset.

Start smaller than you think you need. Scaling up is a one-line command and a few minutes of downtime. Scaling down requires migration and planning.

Use actual data, not gut feel. Cloud Monitoring exists for a reason. If you don't have usage patterns yet, start with the smallest instance that can handle your expected load and scale up based on real metrics.

Environment-specific configuration matters. Production and staging have different availability requirements, different backup needs, and different cost tolerances. Configure them differently.

GCP defaults optimize for reliability, not cost. That's the right choice for a platform, but it means you need to actively optimize for your actual usage patterns.

The Bottom Line

My 0.8% CPU utilisation was embarrassing, but it taught me more about cloud cost optimization than months of reading best practices guides. The specific numbers forced me to understand what each metric actually means and how it translates to real money.

If you're setting up Cloud SQL for the first time, open the monitoring dashboard before you pick your instance tier. The metrics will tell you what you actually need, not what feels reasonable.

And if you're already running Cloud SQL instances, spend ten minutes checking your utilisation numbers. You might be surprised at what you find.

I Was Hand-Writing Every AI Tool. Then I Discovered MCP Servers.

Rue Matchaba — Fri, 27 Mar 2026 20:58:47 +0000

What tool calling and MCP actually mean, and how they fit together when you're building real AI products.

I've been building Pulse, a voice AI co-pilot for engineering work that talks to Jira and GitHub. The idea is simple: speak a command, Claude figures out what to do, your project management tools respond.

To make it work, I had to give Claude the ability to interact with Jira and GitHub. So I did what most people do when they start building with LLMs: I wrote the tools by hand.

tools: [
  { name: "create_jira_ticket", description: "\"...\", input_schema: { ... } },"
  { name: "get_jira_issue", description: "\"...\", input_schema: { ... } },"
  { name: "update_jira_status", description: "\"...\", input_schema: { ... } },"
]

Three tools. Done. It worked fine.

Then I learned what an MCP server actually is, and I realised I had been building with a teaspoon when a fire hose was sitting right there.

First, what is tool calling?

When you build an LLM application, the model lives in a box. It can think, reason, and generate text, but it cannot do anything in the real world on its own.

Tool calling is how you fix that. You define a set of functions and tell the model they exist. When the model decides one is needed, it calls it with the right arguments. Your code executes the function and passes the result back to the model.

User gives instruction
      ↓
Claude decides which tool to call
      ↓
Your code executes it
      ↓
Claude gets the result and responds

It's powerful. But you write every tool yourself. You define the schema, you maintain it, you add new ones when you need them.

So what is an MCP server?

MCP stands for Model Context Protocol. The simplest way I can explain it:

Tool calling is giving Claude a telephone. MCP is giving Claude a universal remote control.

The telephone analogy works because with hand-rolled tools, you're wiring up each number yourself. You decide what Claude can call, you write the definition, you maintain it forever.

An MCP server is a running process that speaks a standard protocol. Claude connects to it and asks: "what tools do you have?" The server responds with a full list. Claude now knows everything it can do without you having written a single tool definition.

More concretely:

	Tool Calling	MCP Server
What it is	Functions you define manually	A running server Claude connects to
Setup	You write each function and schema	Server exposes its tools automatically
Reusability	Tied to your app	Any AI that speaks MCP can use it
Maintenance	Yours forever	The server owner's problem

Jira has an MCP server. GitHub has an MCP server. Instead of my 3 hand-rolled tools, I can point Pulse at both servers and Claude automatically gets access to the full API surface of each product.

Sprints. PRs. Worklogs. Reviews. Branches. Comments. Issue history.

Same voice interface. Dramatically larger capability surface.

The part that actually blew my mind

Claude can reason across multiple MCP servers simultaneously.

So instead of three isolated tool calls, you can say:

"Find all Jira tickets marked Done this sprint, check if the linked PR was actually merged, and flag any that say Done but the PR is still open."

That used to be a custom script someone had to write, test, and maintain. With MCP, it's a voice command.

And when Jira ships new API features? The MCP server updates. Claude automatically has access. You change nothing.

The broader point

Most people are still thinking about AI as a chatbot. You ask it something, it answers.

What I've been building toward is composing intelligence over systems. Claude isn't just answering questions. It's reasoning across your Jira board, your GitHub history, and your internal data, then taking action.

The tools that make that possible are not complicated individually. What takes work is understanding how they fit together and when to use each one.

That's what I'm figuring out in public, one project at a time.

I'm Rue, a Full Stack and AI Engineer building Pulse and writing about what I learn along the way. Follow along on Instagram at @rue.on.ai or connect with me on LinkedIn.

I built my first AI agent. It was mostly plumbing

Rue Matchaba — Fri, 27 Mar 2026 20:44:55 +0000

I spent a weekend trying to understand how AI agents actually work. Not the pitch deck version. The code version. What does function calling look like in practice? What happens when two agents are chained together and one of them fails?

I built a multi-agent research assistant in TypeScript to find out. Three agents: an orchestrator, a summarizer, and a writer. Each one does one job and hands off to the next.

Running a model locally is weirder than it sounds.

I used Ollama, which runs the model directly on your machine. No API key, no remote server. The first time my Express app got a real response back from localhost, I sat there for a second. It’s just an HTTP call. Your code genuinely cannot tell whether there’s a llama3.2 process on your laptop behind it or a data centre somewhere.

Nobody told me that LLMs are just APIs. Text in, text out. Everything interesting happens inside that call, invisible to your code.

Function calling is less magic than I expected, which was both a relief and slightly disappointing.

You describe available tools in the system prompt. The model decides whether to use one and returns structured output with a tool name and arguments. Your code runs the tool and feeds the result back. That’s the whole loop.

I kept waiting for something more mysterious. It’s just prompt engineering and plumbing.

The chaining part was mostly plumbing.

Agent 1 runs, its output becomes input for Agent 2, repeat. What actually took time was writing each agent’s system prompt tight enough that it wouldn’t drift into doing something adjacent to its job, and handling failures mid-chain without the whole thing collapsing silently.

I wrote more error handling code than AI code. I think that’s correct.

If I did it again.

llama3.2 via Ollama is free and runs locally, but it gets shaky on complex instruction-following. If your pipeline depends on consistent structured output, that inconsistency compounds across three agents fast. A hosted model with real function calling support would have saved me some debugging time.

I also didn’t document anything while I was building. By the time I was done, I’d forgotten what actually confused me at the start. Writing this post took longer than it should have for that reason. If you want to understand agents without paying for API calls, Ollama gets you there. Just design around what the model can’t reliably do. Super excited to see what l can build.