Forem: Hiroshi Toyama

Cursor vs Claude: The Business Models Behind the 10x Price Gap

Hiroshi Toyama — Thu, 07 May 2026 08:25:35 +0000

The previous post covered Composer 2's cache mechanics and the Standard/Fast split. This one goes one level deeper: why the price gap exists structurally, and what it predicts about where AI model markets are heading.

The Two Business Models

The $0.50 vs $5.00 price gap between Composer 2 Standard and Claude Opus isn't primarily about model size. It's about two fundamentally different business models.

Anthropic/OpenAI: Build the most capable general-purpose model possible. License it as an API to anyone who wants to use it—enterprises, startups, individual developers. The general-purpose nature requires maintaining capabilities across every domain: legal reasoning, creative writing, mathematics, programming, ethics, philosophy. Margin on each API call covers model development, infrastructure, and business overhead.

Cursor/Anysphere: Build a model only for one product—Cursor. No external API to sell. No licensing fees to pay. No reason to maintain capabilities outside software development. The specialized training means stripping out everything that isn't code, resulting in a dramatically smaller model that's cheaper to serve.

The math follows directly. Composer 2 is trained exclusively on coding data via continued pre-training and reinforcement learning. Claude Opus maintains the ability to pass bar exams, write poetry, explain quantum mechanics, and argue ethics. You're paying for all of that whether you use it or not.

The Cache Write Tax

This business model difference shows up most concretely in cache write pricing.

Claude's prompt caching has three cost components:

Cache write: 1.25× the base input price
Cache read: ~10% of the base input price
Normal input: base price

That cache write surcharge exists because Anthropic is taking on the cost and risk of maintaining cached data for an external customer. They don't know what you'll cache, how long it'll stay relevant, or whether you'll return in 5 minutes or 5 days. The 1.25× write rate is essentially an infrastructure risk premium embedded in the API pricing.

Composer 2's actual usage data tells a completely different story:

Column	Value
Input (w/ Cache Write)	0
Input (w/o Cache Write)	15,018
Cache Read	391,424
Cost	$0.20

The Input (w/ Cache Write) column is zero across every single Composer 2 request. Anysphere runs Composer 2 on their own servers, optimized for exactly one workload: Cursor's codebase-heavy sessions. There's no external API infrastructure risk to price in. The cache write surcharge simply doesn't exist.

For Claude Opus users on Cursor, the same column is non-zero. Even though Cursor proxies the request, it still hits Anthropic's API and incurs the write premium.

The practical effect: on a new session with a large codebase, Claude Opus users pay an entry fee (cache write at 1.25× rate) that Composer 2 users never encounter.

Luxury Engineering

A useful framing emerges from analyzing actual usage patterns: Luxury Engineering.

Using Claude Opus for routine coding tasks is the AI equivalent of hiring a full professor to write unit tests. The professor is qualified—arguably overqualified. They could do it. But you're paying for decades of expertise in domains completely irrelevant to the task: literature, philosophy, ethics, history. That overhead is embedded in every token.

Composer 2 is more like a developer who has done nothing but code their entire career. No breadth, extraordinary depth in the one domain that matters. Because of that specialization, cost is 1/10th.

Full Model Landscape

Looking at the complete pricing picture (2026):

Model	Input	Cache Write	Cache Read	Output
Composer 2	$0.50	none	$0.20	$2.50
GPT-5.3 Codex	$1.75	none	$0.175	$14
Grok 4.20	$2.00	none	$0.20	$6
Gemini 3.1 Pro	$2.00	none	$0.20	$12
Claude 4.6 Sonnet	$3.00	$3.75 (1h: $6.00)	$0.30	$15
GPT-5.5	$5.00	none	$0.50	$30
Claude 4.7 Opus	$5.00	$6.25 (1h: $10.00)	$0.50	$25

GPT-5.3 Codex being significantly cheaper than GPT-5.5 follows the same logic: Codex uses continued pre-training on code data to reduce model weight, and the price difference is essentially "the cost of maintaining the ability to write poetry."

Two patterns stand out:

The Claude cache write anomaly. Only Claude models carry an explicit cache write surcharge. Every other model in this list (including Composer 2) absorbs the write cost into the base price or waives it entirely. This isn't a product limitation—it's a reflection of Claude's external API business model.

Composer 2's output price. $2.50/1M output is 10× cheaper than Claude Opus and 12× cheaper than GPT-5.5. Code generation produces significant output token volume. Composer 2's extreme output pricing means that long agentic sessions—the exact workloads it's designed for—don't hit a cost ceiling.

Claude Code: The Name Is Misleading

The name "Claude Code" implies a coding-specialized model. It isn't. Claude Code is Claude 4.6 Opus or Sonnet—the same general-purpose models available in Cursor—packaged as a CLI tool. The underlying architecture hasn't been pruned for code; it retains the full weight of a general-purpose frontier model.

The cost implications are direct. Claude Code uses Anthropic's standard Prompt Caching, which means the cache write premium (1.25×) applies. The default cache TTL is 5 minutes—long enough to expire while you're running tests or reading docs between prompts. The ENABLE_PROMPT_CACHING_1H=1 flag extends it to one hour, but doubles the write cost in exchange.

The "autonomous loop" (run tests → read failure → fix code → rerun) is frequently cited as a Claude Code advantage. It isn't unique to Claude Code. Cursor's agent mode executes the same loop via its sandboxed terminal integration. The practical difference is that Cursor's loop doesn't incur a cache write penalty on session start, and runs cache reads at $0.20/1M rather than Claude's ~$0.50/1M.

Where Claude Code has a genuine edge: terminal-native workflows for developers using Vim, JetBrains, or any editor outside the Cursor ecosystem. If you're not using Cursor, Claude Code is the most capable CLI agent available. Within Cursor, the economic case for Claude Code over Composer 2 is thin for standard coding tasks.

What This Means for the Future

This structure predicts where AI model markets go.

General-purpose frontier models have a structural cost floor. They have to maintain broad capabilities to justify API pricing across diverse customers. They have to earn margins on external licensing. They have to maintain the "impressive demo" factor that drives enterprise adoption.

Specialized models built for a specific product have none of those constraints. Strip capability, reduce model size, optimize serving infrastructure, eliminate external API margins. The only question is whether sufficient domain quality can be achieved.

Composer 2 answered that question for software development in March 2026. SWE-bench Multilingual score of 73.7, at 1/10th the cost of Claude Opus.

The same economics will play out in other domains: legal AI products trained exclusively on case law and contracts; medical AI running on clinical literature with zero consumer chat capability; financial models stripped of everything except numerical reasoning and accounting standards. None of them need to know how to write a sonnet.

The structural enabler in each case is the same: building a model for one product, not for external licensing. That eliminates the margin layer and enables the infrastructure optimizations that make 5-10× price reduction possible.

The Rational Selection Framework

Given this analysis:

Composer 2 Standard for any multi-turn session against a codebase. Cache compound interest works in your favor: higher turn count → higher cache read ratio → lower effective cost per token. No cache write entry fee on session start.
Composer 2 Fast for interactive sessions where latency matters more than per-token cost.
Claude Opus or Claude 4.7 when you genuinely need cross-domain reasoning—architecture decisions involving organizational and technical trade-offs simultaneously, debugging scenarios requiring external systems understanding outside your loaded context, or when Composer 2 hits an explicit capability ceiling.

From actual usage data: 88.3% cache read ratio on Composer 2 Standard, $0.19 average cost per request on ~390K token requests. The same request volume on Claude Opus: $0.90 average. The top Opus request cost $4.25—enough for 22 equivalent Composer 2 Standard sessions.

The price gap isn't a temporary marketing discount. It's structural, rooted in business model differences that won't close without a fundamental change in how Anthropic operates. As long as Claude is an external API product, the cache write premium and the overhead of general-purpose training remain embedded in the price.

Using llms.txt with Cursor and Claude Code: a concrete playbook

Hiroshi Toyama — Sun, 03 May 2026 11:56:30 +0000

llms.txt is a small text file on a documentation site—usually lists what the product is and links to the important Markdown pages. For coding agents, treat it as the canonical URL to open first when upstream behavior is unclear. This post is mostly setup and workflow, not theory.

What goes where

Location	Put this there
Official doc server	`https://example.com/llms.txt` (maintained by the library/vendor)
Your repo	URLs only (and short protocols), in agent rules—not a copy of their docs
Your repo `.cursor/rules/`	Project map, conventions, your architecture—not Next.js’s full manual

If you paste thousands of tokens of upstream docs into rules, every chat pays for them. Keeping pointers in rules and loading docs on demand avoids that.

One-time setup: a dedicated rules file

Create something like .cursor/rules/external-llms-docs.md (name does not matter; keep it scoped). Paste a stable list of llms.txt URLs your stack actually uses, grouped so humans and agents scan quickly.

# External docs — fetch on demand

Use web fetch / browser / search tools to load these when implementing or debugging
third-party behavior. Do not paste full upstream docs into the chat.

## Index URLs (read these first)

| Area | llms.txt |
| --- | --- |
| Next.js | https://nextjs.org/llms.txt |
| Tailwind | https://tailwindcss.com/llms.txt |
| Lucide | https://lucide.dev/llms.txt |
| Google ADK | https://adk.dev/llms.txt |

## Read order

1. Fetch the **llms.txt** for the dependency that owns the question.
2. Follow **only** links from that file (or obvious `/docs/*.md` siblings) for depth.
3. Prefer Markdown sources over scraping marketing HTML.
4. If types exist locally (`node_modules`, stubs), use them **after** you know which API surface applies (avoids guessing wrong symbols).

## Scope

- Questions about **our** repo layout → use `repo-map` rule / codebase search, not llms.txt.
- Questions about **their** API/version/docs → use the table above.

Why a separate file: Cursor injects rules by context; a fat global rule file makes unrelated edits heavier. Split internal vs external pointers.

Agent protocol (copy into the same file or AGENTS.md)

Make the sequence explicit so the model does not default to “grep node_modules for an hour.”

## External SDK protocol

When the user asks for behavior that depends on an external library version or API:

1. Identify which dependency owns the feature (package.json / imports).
2. If this file lists an llms.txt for that dependency, **fetch it before** writing code.
3. Summarize in ≤10 lines: version assumptions, file names, and APIs you will use—then implement.
4. Do not quote entire upstream pages back to the user; cite chapter/section or URL path only.

Concrete workflows

Implement a feature (e.g. App Router auth middleware).

User: “Add middleware-based auth with Next.js App Router.”
Agent: fetch https://nextjs.org/llms.txt, open the linked page that describes middleware.ts / matcher patterns.
Implement using current filenames and signatures from that fetch—not memory.

Debug “works on my machine” / deprecation.

User: “Tailwind v4 class names stopped working after upgrade.”
Agent: fetch Tailwind’s llms.txt first; confirm breaking-change notes and config file names, then open repo tailwind.config.* / CSS entry.

SDK with tiered dumps (example pattern).

Some sites expose a short index and a long bundle (names vary). Rule of thumb: start short, upgrade to full only if the stub did not answer.

# hypothetical layout on a docs host
/llms.txt          → links + overview
/llms-small.txt    → minimal surface (cheap)
/llms-full.txt     → everything (expensive)

Point your rules at the entry (llms.txt); let the fetched content tell the agent whether *-full exists.

Prompts that reinforce good habits

You can nudge behavior per task without editing rules:

“Before editing: fetch Next.js llms.txt and confirm middleware filename and export shape.”
“Use ADK llms.txt; don’t rely on training cutoff for API names.”
“After fetching Tailwind llms.txt, list which doc URLs you used (paths only).”

Minimal internal llms.txt (optional)

If you ship an internal library or architecture handbook on HTTPS, you can publish your own index at https://internal-docs.example.com/llms.txt:

# Internal platform — LLM index

## Auth
- Overview: https://internal-docs.example.com/auth/overview.md
- Breaking changes 2026: https://internal-docs.example.com/auth/changelog.md

## Data layer
- API conventions: https://internal-docs.example.com/db/conventions.md

Then add one line to .cursor/rules/external-llms-docs.md: Internal platform | https://internal-docs.example.com/llms.txt. Same mechanics as vendor docs.

Tooling reality check

This pattern assumes the agent can retrieve HTTPS text (built-in fetch, browser tool, MCP fetch, etc.). Air-gapped machines need a fallback (mirror snippets in rules, local static server, or vendor tarball—but accept resident token cost).

Do not put authenticated URLs with secrets in rules; use public docs or internal SSO-aware tooling outside plain markdown.

Anti-patterns

Dumping full upstream Markdown into .cursorrules “so the agent always knows.”
Skipping llms.txt and crawling random marketing pages (noisy HTML, wasted tokens).
Duplicating vendor docs under docs/vendor/ and indexing everything unless you truly need offline.

SEO note (short)

Search-engine teams have questioned llms.txt as an SEO lever; that is largely orthogonal. For coding agents, the win is predictable Markdown entrypoints and smaller always-on context—not rankings.

Summary

Add .cursor/rules/external-llms-docs.md with a table of llms.txt URLs plus read order and scope (external vs internal repo map).
Teach agents: fetch index → follow linked Markdown → then local types.
Use tiered files shallow-first when the provider offers them.
Optionally host your own llms.txt for internal platforms; still keep rules as pointers only.

Cursor Composer 2: The Cache Economy Behind a 10x Cheaper Coding Agent

Hiroshi Toyama — Sat, 02 May 2026 12:53:01 +0000

Cursor's Composer 2 shipped in March 2026 as the centerpiece of the Cursor 2.0 overhaul. The headline numbers—$0.50/1M input tokens, outperforming frontier models on SWE-bench Multilingual—look like marketing. The cache read mechanism is where the real story is.

Why a Specialized Model at All

Prior Cursor versions proxied Claude or GPT-4. Composer 2 is trained exclusively on coding data via continued pre-training and reinforcement learning. The obvious question is: what's cut?

Everything that isn't code. Composer 2 has no meaningful capability for poetry, history, ethics debates, or anything outside software development. That constraint lets Anysphere run a model that:

Understands intra-repo dependency graphs (if you fix A, B also needs updating)
Navigates hundreds of files in a single long-horizon task
Runs natively in sandboxed terminals and a built-in browser loop
Costs a fraction of what a general-purpose frontier model costs to serve

The pricing reflects this. As of May 2026:

Model	Input (1M tokens)	Output (1M tokens)
Composer 2 Standard	$0.50	$2.50
Composer 2 Fast	$1.50	$7.50
Claude 4.6 Opus	$5.00	$25.00
GPT-5.4	$2.50	$15.00

Standard vs Fast: Same Weights, Different Queue

Anysphere's own language is unambiguous: "Same intelligence." The two variants share identical model weights and parameters. Fast gets priority queue on high-end GPUs (H800/B200 class); Standard runs on lower-priority compute with higher latency tolerance.

This is a deliberate architectural choice. Inference cost scales with compute priority, not model capability. If you can tolerate a 10–30 second response delay, you get the same output for 1/3 the price.

The practical split that Cursor power users have settled on:

Interactive sessions (Fast): You're watching the output in real time. Latency kills flow.
Fire-and-forget tasks (Standard): Refactor 100 test files, generate JSDoc across the repo, migrate an entire API surface. Start it, close the laptop, come back to results.

The Cache Read Economy

This is the mechanism that makes Standard compelling for large codebases.

Every request to Composer 2 sends context: directory structure, recently opened files, conversation history. On the second, fifth, tenth turn of the same session, the majority of that context is identical to what was already sent. That's the cache.

Cache read rates as of May 2026:

Tier	New input	Cache read
Standard	$0.50/1M	$0.20/1M
Fast	$1.50/1M	$0.35/1M

By turn 5 of a non-trivial session, 80%+ of your input tokens are cache reads, not fresh input. Standard's cache read rate ($0.20) is 43% cheaper than Fast's ($0.35), and 60% cheaper than Standard's own new input rate.

Concrete impact: A refactoring session with 10 back-and-forth turns on a large codebase might consume 10M tokens. With Standard and healthy cache hits, that lands around $1.50–$2.00. The same session on Fast: $4.00–$5.00. On Claude 4.6 Opus: potentially $20+.

The Cache Bug (March–April 2026)

The cache story has a footnote worth documenting.

From late March through early April 2026, a backend bug caused Composer 2 Standard to emit cache read counts of zero—every request treated as fresh input at $0.50/1M even when the context was identical to the previous turn. Users reported credit burn rates 10x higher than expected. The irony: switching to Fast (which costs 3x more per token) actually resulted in lower total cost because cache was functioning there.

Cursor's team (Dean and Mohit on the forum thread) acknowledged the bug and pushed a fix around April 7. As of v2.1.116+, the behavior appears stable.

The diagnostic check: open cursor.com/settings → Usage. If Cache Read tokens are consistently below 40% on a multi-turn session against the same codebase, something is wrong. Expected range is 40–90% depending on how varied your requests are.

If you hit zero cache read consistently, copy the Request ID from the chat header and contact support. Cursor has been issuing credit refunds for the overbilling period.

Comparing with Claude Code's Cache

Claude Code (Anthropic's CLI tool) has its own prompt caching via cache_control markers, but with a key structural difference: TTL.

Setting	Write cost	Read cost	TTL
Default	1.25× input	~10% of input	5 minutes
`ENABLE_PROMPT_CACHING_1H=1`	2.0× input	~10% of input	1 hour

The 5-minute default is brutal for any session where you read documentation, test code, or think between turns. The 1-hour option (available since Claude Code v2.1.108) adds to the write cost but eliminates repeated cache misses across the kind of natural pauses that happen in real work.

To enable it:

# ~/.zshrc or ~/.bashrc
export ENABLE_PROMPT_CACHING_1H=1

Verify with usage output during a session—look for ephemeral_1h_input_tokens in the log. If you only see ephemeral_5m_, the variable isn't being picked up.

Note: there were also TTL-related bugs in this period that forced resets to 5-minute behavior. Keep Claude Code at the latest version.

My Usage Data

I exported my own Cursor usage history and analyzed it. Here's what a month looks like across models (442 requests):

Model	Requests	Avg cost/request	Cache read ratio
Composer 2 Standard	73	$0.19	88.3%
Composer 2 Fast	25	$0.32	78.1%
Claude 4.6 Sonnet	212	$0.37	84.7%
Claude 4.6 Opus	93	$0.90	79.5%

The 88.3% cache read ratio on Standard is the headline. For an average request consuming ~390K tokens, 88% of those are cache reads at $0.20/1M rather than fresh input at $0.50/1M. Without that cache hit rate, the average cost per request would be ~$0.40 instead of $0.19.

The top Opus requests peaked at $4.25/request (3.9M total tokens, 3.8M of which were cache reads). Even with excellent cache ratios, Opus's higher base rates mean the same cache-heavy session costs 4–5× more than Composer 2 Standard.

The Actual Decision

Composer 2 is not "Claude but cheap." It's a purpose-built agent runtime that has traded general intelligence for deep coding capability and cost efficiency at the infrastructure level. The Standard/Fast split exists because long-horizon agentic tasks don't need millisecond response times—and charging for that latency premium on 10-turn refactoring sessions is wasteful.

The model choice that makes sense given this:

Default to Standard for any multi-file task where you'll have more than 3–4 turns
Switch to Fast for interactive chat where you're watching output incrementally
Use frontier models (Opus, Claude 4.7) only when Composer 2 hits a genuine capability ceiling—complex algorithmic reasoning, architecture decisions that span non-code domains

The cache makes Standard not just "slower Fast," but a qualitatively different operational mode: background processing with cost amortized over a long context window that grows cheaper the more you reuse it.

Two Nasty Gotchas When Building Multi-Agent Systems with Google ADK

Hiroshi Toyama — Tue, 28 Apr 2026 09:30:43 +0000

Google's Agent Development Kit (ADK) makes it straightforward to compose LlmAgent instances into multi-agent hierarchies. But two bugs bit me hard in production that aren't documented anywhere. Here's what happened and how to fix them.

The Setup

A root router LlmAgent with two sub-agents. Both sub-agents are module-level singletons — instantiated at import time, referenced from the root agent's constructor.

# Agents/my_app/root_agent.py
from Agents.my_app.sub_agent_a.agent import sub_agent_a
from Agents.my_app.sub_agent_b.agent import sub_agent_b

def _build_sub_agents() -> list:
    return [sub_agent_a, sub_agent_b]

root_agent = LlmAgent(
    name="my_app",
    sub_agents=_build_sub_agents(),
    ...
)

Worked fine locally with adk web. Blew up on Cloud Run.

Bug 1: `Agent already has a parent agent` on module reload

The error

pydantic_core._pydantic_core.ValidationError: 1 validation error for LlmAgent
  Value error, Agent `SubAgentA` already has a parent agent,
  current parent: `my_app`, trying to add: `my_app`

What's happening

ADK's agent_loader calls importlib.import_module(agent_name) on every request. On the first request, it loads the module fresh and creates root_agent. The LlmAgent constructor sets sub_agent.parent_agent = root_agent for each sub-agent.

On the second request, agent_loader reloads the module. Because sub_agent_a and sub_agent_b are module-level singletons, they're the same Python objects from the previous load — still carrying their parent_agent reference. When the new LlmAgent tries to assign the parent again, pydantic's validator rejects it.

# Inside ADK's LlmAgent.__init__ (simplified)
for sub in sub_agents:
    if sub.parent_agent is not None:
        raise ValueError(f"Agent `{sub.name}` already has a parent agent ...")
    sub.parent_agent = self

This never surfaces locally because adk web loads the module only once per session. Cloud Run's request-per-reload behavior is what triggers it.

The fix

Reset parent_agent to None before passing sub-agents to the constructor:

def _build_sub_agents() -> list:
    agents = [sub_agent_a, sub_agent_b]
    for agent in agents:
        agent.parent_agent = None  # reset before each reload
    return agents

This is safe because the assignment happens synchronously before the new parent is set.

Bug 2: `Context variable not found` in instruction strings

The error

KeyError: 'Context variable not found: `hostname`.'

Traceback points here:

File ".../google/adk/utils/instructions_utils.py", line 124, in inject_session_state
    return await _async_sub(r'{+[^{}]*}+', _replace_match, template)

What's happening

ADK injects session state into agent instructions at runtime. The mechanism scans the instruction string with the regex r'{+[^{}]*}+' and replaces every {var_name} with the corresponding session state value.

If your instruction contains an example URL or any template-like text with curly braces:

The URL format is `https://{hostname}/api/{resource_id}/`

ADK sees {hostname}, looks it up in session state, finds nothing, raises KeyError.

My first instinct was to double-brace escape like Python's .format():

https://{{hostname}}/api/{{resource_id}}/

This does not work. The regex is {+[^{}]*}+ — it matches one or more { characters followed by non-brace characters followed by one or more } characters. {{hostname}} still matches.

The fix

Don't use curly braces for literal placeholder text in instructions:

The URL format is `https://<hostname>/api/<resource_id>/`

More broadly: any {word} pattern in an ADK instruction string is treated as a session state variable, regardless of how many braces you use. Use angle brackets, square brackets, or prose for template-like text in prompts.

Summary

Bug	Trigger	Fix
`parent_agent` collision	Module-level singleton sub-agents + ADK module reload per request	Reset `agent.parent_agent = None` before passing to constructor
`Context variable not found`	`{word}` patterns in instruction strings	Use `<word>` or square brackets instead

Both are easy to fix once you know what's happening, but the error messages don't immediately point to the root cause. The parent_agent one is especially sneaky — it only appears in production where the module is reloaded per request, never in adk web during local development.

Managing AI Agent Skills with `npx skills`: A Practical Guide

Hiroshi Toyama — Sat, 11 Apr 2026 08:04:45 +0000

The Problem

AI agents like Claude Code, Cursor, and GitHub Copilot don't inherently know how to use every tool in your stack. You need a way to teach them. That's what npx skills does — it's a package manager for AI agent behaviors, built by Vercel Labs.

npx skills add microsoft/playwright-cli

This command fetches a SKILL.md from the specified GitHub repository and installs it into your agent's config directory (.agents/skills/ or .claude/skills/ depending on the agent).

How It Works

GitHub as the Registry

Unlike npm which uses npmjs.com, skills uses GitHub as its registry. The microsoft/playwright-cli argument maps directly to https://github.com/microsoft/playwright-cli. Any public GitHub repo with a SKILL.md at root is a valid skill source.

You can also install by full URL:

npx skills add https://github.com/microsoft/playwright-cli

SKILL.md as the Package Entry Point

Each skill repo contains a SKILL.md — the equivalent of index.js in an npm package. It contains:

Metadata: name and description of the skill
Tool definitions: commands the AI can invoke (e.g. playwright test)
Prompt instructions: when and how the AI should use the tool

.skills.json + skills-lock.json = package.json + package-lock.json

Concept	npm	skills CLI
Dependency manifest	`package.json`	`.skills.json`
Lock file	`package-lock.json`	`skills-lock.json`
Install directory	`node_modules/`	`.agents/skills/`
Registry	npmjs.com	GitHub
Install command	`npm install`	`npx skills experimental_install`

After npx skills add, your .skills.json will look like:

{
  "skills": [
    {
      "name": "playwright-cli",
      "remote": "microsoft/playwright-cli",
      "version": "latest"
    }
  ]
}

Key Commands

# Add a skill
npx skills add vercel-labs/agent-skills

# Add globally (user-level, not project-level)
npx skills add vercel-labs/agent-skills -g

# Target specific agents
npx skills add vercel-labs/agent-skills --agent claude-code cursor

# List installed skills
npx skills list
npx skills ls -g           # global skills
npx skills ls -a cursor    # filter by agent

# Search the registry
npx skills find typescript

# Update all skills
npx skills update

# Restore from lock file (equivalent of npm ci)
npx skills experimental_install

# Sync from node_modules to agent directories
npx skills experimental_sync

# Scaffold a new skill
npx skills init my-skill

Gotchas

`remove` Doesn't Update the Lock File

This is the biggest footgun:

npx skills rm microsoft/playwright-cli --all

This removes the skill files from your agent directories, but leaves the entry in skills-lock.json. The next time someone runs experimental_install, the skill comes back.

Workaround:

Run npx skills remove as usual
Manually edit .skills.json to remove the entry
Delete skills-lock.json
Run npx skills update or add remaining skills to regenerate a clean lock file

`experimental_` Prefix is Real

experimental_install and experimental_sync are genuinely experimental. The sync command in the current version is not npx skills sync — it's npx skills experimental_install to restore from lock file, and npx skills experimental_sync to sync from node_modules.

Cache Behavior with npx

npx skills may run a cached older version. Force latest:

npx skills@latest add <repo>

For projects where everyone needs the same CLI version, add it as a devDependency:

npm install --save-dev skills

CI/CD Integration

Add to your CI setup to restore skills on each run:

- name: Restore AI agent skills
  run: npx skills experimental_install

This ensures every developer and CI environment uses exactly the same skill versions as defined in skills-lock.json.

Creating Your Own Skill

Any GitHub repo with a SKILL.md is installable. Create one with:

npx skills init my-skill

This scaffolds a SKILL.md that you push to GitHub. Anyone can then install it with:

npx skills add yourusername/my-skill

Browse existing skills at skills.sh.

Summary

npx skills is npm for AI agent capabilities. The mental model maps cleanly:

SKILL.md = index.js
.skills.json = package.json
skills-lock.json = package-lock.json
experimental_install = npm ci
GitHub = npm registry

The tooling is still experimental — particularly the lock file management on remove — but it's already useful for ensuring consistent AI behavior across team environments.

Deploying a Google ADK Agent to Vertex AI Agent Engine with Terraform

Hiroshi Toyama — Mon, 30 Mar 2026 12:45:33 +0000

Most documentation for Vertex AI Agent Engine focuses on the Python SDK (vertexai.agent_engines.create). That works fine for one-off deployments, but if you want your agent infrastructure managed declaratively alongside the rest of your GCP resources, Terraform is the right tool.

This post walks through a complete Terraform setup for deploying a Google ADK agent to Vertex AI Agent Engine using google_vertex_ai_reasoning_engine.

Prerequisites

Terraform >= 1.5
google or google-beta provider
aiplatform.googleapis.com enabled
A Google ADK agent wrapped in AdkApp

How Agent Engine Deployment Works

The deployment model is straightforward: tar.gz your source code, base64-encode it, and pass it to the API via inline_source. The runtime handles dependency installation, session management, and streaming — you just provide the entrypoint.

The Agent Entrypoint

The key requirement is an AdkApp instance at the module level. This is what Terraform's entrypoint_object points to.

# src/myagent/agent.py
from google.adk.agents import Agent
from vertexai.agent_engines import AdkApp

def get_weather(city: str) -> dict:
    """Returns weather for a given city."""
    return {"city": city, "weather": "sunny"}

root_agent = Agent(
    name="weather_agent",
    model="gemini-2.5-flash",
    description="Answers weather questions",
    instruction="Answer the user's weather question using the get_weather tool.",
    tools=[get_weather],
)

# This is what entrypoint_object references
agent_engine = AdkApp(agent=root_agent)

Wrapping in AdkApp automatically exposes create_session, stream_query, and other ADK methods as callable endpoints.

Building the Source Archive

Agent Engine expects a base64-encoded tar.gz containing your source files and a requirements.txt. Here's a minimal build script that uses only the Python standard library:

# scripts/build_source.py
import base64
import io
import json
import sys
import tarfile
from pathlib import Path


def main():
    query = json.load(sys.stdin)
    project_root = Path(query["project_root"]).resolve()
    src_dir = project_root / "src"
    requirements = project_root / "requirements.txt"

    buf = io.BytesIO()
    with tarfile.open(fileobj=buf, mode="w:gz") as tar:
        for path in sorted(src_dir.rglob("*.py")):
            arcname = path.relative_to(project_root)
            tar.add(path, arcname=str(arcname))
        tar.add(requirements, arcname="requirements.txt")

    b64 = base64.b64encode(buf.getvalue()).decode()
    json.dump({"base64": b64}, sys.stdout)


if __name__ == "__main__":
    main()

The requirements.txt lists PyPI package names — the Agent Engine runtime installs them at deploy time:

google-adk>=1.0.0
google-cloud-aiplatform[agent_engines]>=1.93.0

Terraform Configuration

`provider.tf`

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 6.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 6.0"
    }
  }
}

provider "google-beta" {
  project = var.project_id
  region  = "asia-northeast1"
}

Wiring the Archive Build

Use the external data source to invoke the build script during terraform plan/apply:

data "external" "agent_source" {
  program = ["python3", "${path.module}/scripts/build_source.py"]

  query = {
    project_root = "${path.module}/../.."
  }
}

Every apply picks up the latest source automatically.

The Agent Engine Resource

resource "google_vertex_ai_reasoning_engine" "my_agent" {
  provider     = google-beta
  display_name = "my-agent-${var.env}"
  description  = "ADK weather agent"
  region       = "asia-northeast1"

  spec {
    agent_framework = "google-adk"
    service_account = var.service_account_email

    class_methods = jsonencode([
      { name = "create_session", api_mode = "" },
      { name = "get_session",    api_mode = "" },
      { name = "list_sessions",  api_mode = "" },
      { name = "delete_session", api_mode = "" },
      { name = "stream_query",   api_mode = "stream" },
    ])

    source_code_spec {
      inline_source {
        source_archive = data.external.agent_source.result.base64
      }

      python_spec {
        entrypoint_module = "src.myagent.agent"
        entrypoint_object = "agent_engine"
        requirements_file = "requirements.txt"
        version           = "3.12"
      }
    }

    deployment_spec {
      env {
        name  = "LOG_LEVEL"
        value = "INFO"
      }

      secret_env {
        name = "API_TOKEN"
        secret_ref {
          secret  = "my-api-token"
          version = "latest"
        }
      }
    }
  }
}

Block Reference

Block	Purpose
`agent_framework`	Must be `"google-adk"` — tells the runtime which framework to use
`class_methods`	Enumerates callable methods; `api_mode = "stream"` enables SSE
`inline_source`	Embeds the base64 tar.gz directly — no GCS bucket needed
`python_spec`	Specifies the entrypoint and Python version
`deployment_spec`	Injects env vars and Secret Manager secrets at runtime

`class_methods` in Detail

Every ADK method you want to expose must be declared explicitly:

class_methods = jsonencode([
  { name = "create_session", api_mode = "" },
  { name = "get_session",    api_mode = "" },
  { name = "list_sessions",  api_mode = "" },
  { name = "delete_session", api_mode = "" },
  { name = "stream_query",   api_mode = "stream" },
])

api_mode = "stream" makes the method return a Server-Sent Events stream. Only stream_query needs this — the rest are standard request/response.

Gotchas

Provider choice matters. google_vertex_ai_reasoning_engine is available in both google and google-beta providers. Make sure the provider attribute matches whichever you configure.

The external data source re-runs on every plan. This is by design — you always get the latest source. If your build script is slow, consider caching or only calling it on apply.

entrypoint_module uses dot notation, not file paths. src.myagent.agent maps to src/myagent/agent.py in the archive. Match this to your actual directory structure.

Secret Manager secrets must already exist. Terraform reads existing secrets via data sources — it doesn't create them. Provision secrets separately before running apply.

Deploy and Verify

terraform init
terraform plan
terraform apply

Export the resource name to call the agent from Python:

output "agent_engine_resource_name" {
  value = google_vertex_ai_reasoning_engine.my_agent.name
}

import vertexai
from vertexai.agent_engines import AdkApp

vertexai.init(project="my-project", location="asia-northeast1")

agent = AdkApp.from_resource_name(
    "projects/my-project/locations/asia-northeast1/reasoningEngines/<ID>"
)

session = agent.create_session(user_id="user-1")
for chunk in agent.stream_query(
    user_id="user-1",
    session_id=session["id"],
    message="What's the weather in Tokyo?",
):
    print(chunk)

Summary

google_vertex_ai_reasoning_engine is available in both google and google-beta providers
Source code is delivered as a base64 tar.gz via inline_source — no GCS required
A minimal Python script using only stdlib is enough to produce the archive
class_methods must explicitly enumerate every ADK method you want to expose
Secret Manager integration is declarative via secret_env blocks

Terraforming Agent Engine makes cleanup (terraform destroy), environment promotion, and drift detection straightforward. The ADK + Terraform combination has sparse documentation, so hopefully this fills the gap.

AI-Driven Chrome Extension Development with WXT and Chrome DevTools MCP

Hiroshi Toyama — Sun, 29 Mar 2026 12:21:39 +0000

The Problem

Building a Chrome extension that modifies a third-party web app is a unique challenge. The DOM structure is opaque, class names are minified and change between deployments, and there's no official API to hook into. Traditional extension development looks like this:

Inspect the DOM manually in DevTools
Write selectors and content scripts
Reload the extension
Check if it works
Repeat

This cycle is slow. I wanted an AI coding agent that could see the actual browser state and verify its own changes — not just generate code blindly.

That's how I arrived at this stack: WXT for the extension framework, Chrome DevTools MCP for giving the AI agent browser access, and Cursor as the IDE tying it all together.

The Stack

Tool	Role
WXT	Chrome extension framework (TypeScript, hot reload, Manifest V3)
Chrome DevTools MCP	MCP server that exposes Chrome DevTools Protocol to AI agents
Cursor	AI-powered IDE with native MCP support

Step 1: WXT with a Fixed CDP Port

WXT is a framework that wraps Chrome extension development with file-based routing, hot reload, and TypeScript support out of the box. The key insight is that WXT's runner can launch Chrome with custom Chromium args — including --remote-debugging-port.

// wxt.config.ts
import { defineConfig } from 'wxt';

export default defineConfig({
  manifest: {
    name: 'My Extension',
    version: '1.0',
    permissions: ['storage'],
  },
  extensionApi: 'chrome',
  runner: {
    chromiumArgs: [
      '--remote-debugging-port=9222',
      `--user-data-dir=${process.cwd()}/.chrome-debug-profile`,
      '--exclude-switches=enable-automation',
    ],
    startUrls: ['https://example.com'],
  },
});

Three things to note:

--remote-debugging-port=9222 — Exposes the Chrome DevTools Protocol on a fixed port. The MCP server connects here.
--user-data-dir — A dedicated profile directory, separate from your daily Chrome. Login sessions persist across dev restarts. Add this to .gitignore — it contains cookies and session tokens that must not be pushed to a repository.
--exclude-switches=enable-automation — Without this, some sites detect the "automated" browser and block sign-in.

When you run wxt (or npm run dev), WXT launches Chrome with these args, loads your extension, and watches for file changes — all in one command.

Step 2: Chrome DevTools MCP Configuration

MCP (Model Context Protocol) lets AI agents call external tools. Chrome DevTools MCP is an MCP server that wraps the Chrome DevTools Protocol — giving your AI agent the ability to navigate pages, evaluate JavaScript, take screenshots, and inspect the DOM.

Configuration lives in .cursor/mcp.json:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": [
        "-y",
        "chrome-devtools-mcp@latest",
        "--browserUrl=http://127.0.0.1:9222"
      ]
    }
  }
}

That's it. When Cursor starts, it spins up the MCP server, which connects to Chrome on port 9222. The AI agent can now see and interact with your browser.

Step 3: The Dev Script (Optional but Recommended)

A small shell script wrapping the two main workflows keeps things ergonomic:

./dev.sh dev      # WXT dev server + Chrome (hot reload + MCP)
./dev.sh start    # Load built extension (no hot reload, MCP only)
./dev.sh stop     # Kill debug Chrome
./dev.sh status   # Check CDP connection

dev is the primary mode — WXT handles everything and you get hot reload. start is for testing production builds with MCP inspection. The script handles edge cases like port conflicts, PID management, and connection verification via curl http://localhost:9222/json/version.

The Workflow in Practice

1. Start the environment

npm run dev   # or ./dev.sh dev

Chrome opens automatically with the extension loaded. WXT watches for file changes. The MCP server connects to port 9222. Cursor's AI agent can now see the browser.

2. AI inspects current state

The AI agent evaluates JavaScript in the browser context to understand the current DOM:

// The agent runs this via MCP
evaluate_script({
  function: `() => ({
    injectedStyle: !!document.getElementById('my-extension-styles'),
    buttonCount: document.querySelectorAll('.my-custom-button').length,
    panelVisible: !!document.getElementById('my-panel'),
  })`
})

The agent can verify its own changes without you switching context. It writes code, WXT hot-reloads, and the agent checks if the DOM updated correctly.

3. AI verifies changes through the browser

Here's the key difference from normal AI-assisted coding. Instead of:

"I've added the panel. Please refresh and check if it works."

The AI does this:

"I've added the panel. Let me verify... [evaluates script via MCP] ... The #my-panel element exists, has 5 child entries, and is positioned correctly. Rendering looks good."

Text-based DOM verification is preferred over screenshots — it's faster, cheaper, and more precise:

// Good: structured verification
evaluate_script(() => {
  const buttons = document.querySelectorAll('.my-button');
  return {
    count: buttons.length,
    firstButton: buttons[0]?.outerHTML.substring(0, 200),
  };
});

// Screenshots only when you need visual layout confirmation
take_screenshot()

Tips and Gotchas

Content Script Isolation

Chrome extension content scripts run in an isolated world. Variables set on window in the content script are invisible to evaluate_script via MCP, because MCP evaluates in the page context.

The workaround: verify through DOM side effects, not global variables.

// Won't work: window globals are in the isolated world
evaluate_script(() => window.myExtensionState) // → undefined

// Works: check the DOM changes the extension made
evaluate_script(() => ({
  styleInjected: !!document.getElementById('my-extension-styles'),
  panelExists: !!document.getElementById('my-panel'),
}))

Selector Strategy for Third-Party UIs

When building extensions for sites you don't control, selectors break frequently. A fallback chain helps:

const el =
  document.querySelector('[aria-label*="Submit"]') ||
  document.querySelector('[data-test-id="submit"]') ||
  document.querySelector('.submit-btn');

Priority:

ARIA attributes (aria-label, role) — most stable across updates
Semantic attributes (data-test-id) — moderately stable
Class names — last resort, always provide as fallback

You can even build a DOM analyzer shortcut (Ctrl+Shift+D) that exports the page structure in a format the AI agent can consume. When selectors break, press the shortcut, paste the output into Cursor, and the agent updates the fallback selectors.

Async DOM Waiting

SPA elements appear asynchronously. Rather than fragile setTimeout chains, use polling with bounded retries:

let retries = 0;
const interval = setInterval(() => {
  const el = document.querySelector(selector);
  if (el || retries++ > 10) {
    clearInterval(interval);
    if (el) callback(el);
  }
}, 500);

If the element never appears, fail silently — no console spam.

Google Login Gotcha

When Chrome launches with --remote-debugging-port, Google sometimes detects it as an "unsafe browser" and blocks sign-in. The --exclude-switches=enable-automation flag helps, but if it's not enough:

Launch Chrome with the dedicated profile (without WXT)
Sign in manually
Close Chrome
Now run npm run dev — WXT reuses the same profile with the valid session

The dedicated --user-data-dir persists your login across dev sessions.

`--user-data-dir` and Security

The dedicated profile serves two purposes:

Isolation from your daily Chrome: The dev browser doesn't touch your bookmarks, extensions, or sessions — and your personal credentials don't leak into the dev environment.
Minimal credentials: Only log into what you need for development. Don't sign into personal Gmail or other unrelated accounts.

Keep in mind:

Always add the profile directory to .gitignore. It contains cookies, session tokens, and LocalStorage.

.chrome-debug-profile/

CDP port 9222 is accessible from localhost. --remote-debugging-port binds to 127.0.0.1 by default, but any process on your machine can access all open tabs. Only run it during active development.
Don't use this on shared machines. While CDP is open, anyone on the same machine can control the browser session.

Getting Started

If you want to try this workflow:

1. Create a WXT project

npm create wxt@latest my-extension
cd my-extension

2. Add CDP port to `wxt.config.ts`

runner: {
  chromiumArgs: [
    '--remote-debugging-port=9222',
    `--user-data-dir=${process.cwd()}/.chrome-debug-profile`,
    '--exclude-switches=enable-automation',
  ],
},

3. Create `.cursor/mcp.json`

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "chrome-devtools-mcp@latest", "--browserUrl=http://127.0.0.1:9222"]
    }
  }
}

4. Run

npm run dev

Chrome launches with CDP enabled, Cursor's agent connects. Your AI agent can now see your browser.

Why This Workflow Matters

The traditional Chrome extension development loop is write → reload → manually check → repeat. With WXT + Chrome DevTools MCP, it becomes write → auto-reload → AI verifies → iterate — and the AI agent can do the first and last steps too.

Debugging goes from "read console logs, set breakpoints, manually reproduce" to "AI evaluates scripts in the live browser and reports what's happening."
Selector maintenance goes from "open DevTools, inspect element, copy selector, paste into code" to "AI reads the DOM and updates fallback selectors."
Feature development goes from "code blind, test manually" to "AI writes code, checks DOM state, fixes issues — all in one turn."

This doesn't replace understanding your own extension. But it dramatically shortens the feedback loop, especially for the tedious parts of third-party DOM manipulation.

BigQuery Global Queries: Join Data Across Regions Without ETL

Hiroshi Toyama — Sun, 22 Mar 2026 12:38:06 +0000

As of February 2026, Google released BigQuery Global Queries in Preview. It lets you join tables from completely different geographic regions — say, asia-northeast1 (Tokyo) and us-central1 (Iowa) — in a single SQL statement. No ETL, no data movement pipelines, no manual copying.

This post covers how it actually works under the hood, what it costs, and the gotchas you need to know before using it in production.

The Old Problem

BigQuery historically required all datasets referenced in a single query to live in the same location. If your sales data was in Tokyo and your user master was in the US, you had two options:

Copy one dataset to the other region (ETL pipeline, operational overhead).
Run two separate queries and join the results in application code.

Global Queries eliminates this constraint.

How It Works: 4-Stage Execution

When you run a global query, BigQuery orchestrates the execution across regions transparently:

1. Distributed Execution

The Query Optimizer analyzes the query, identifies which tables live in which regions, and assigns the querying region as the Primary Region (the "leader"). Workers in each remote region receive their execution assignments in parallel.

2. Data Pushdown

This is the most critical stage — and the one that makes global queries economically viable.

Before any data crosses the network, BigQuery applies three types of pushdown to minimize transfer size:

Predicate Pushdown: WHERE clause filters run in the remote region, before the data moves. A 100M-row table filtered to 100 rows transfers 100 rows — not 100M.
Projection Pushdown: Only the columns named in SELECT are read from remote storage. BigQuery's columnar storage (Capacitor) makes this efficient.
Aggregation Pushdown: GROUP BY/SUM/COUNT operations run as partial aggregations in the remote region. A billion-row transaction table can be summarized to 365 rows (daily totals) before transfer.

3. Data Transfer

Filtered, minimized results travel over Google's internal network to the Primary Region, where they're stored in temporary internal tables for up to 8 hours. This is where cross-region egress charges are incurred.

4. Final Join

The Primary Region merges local data with the temporary remote data, as if everything were in one place. The query result returned to the user looks like any normal BigQuery result.

-- Executed from asia-northeast1 (Tokyo)
SELECT
  t1.product_id,
  t1.sales + t2.sales AS total_global_sales
FROM `project.japan_dataset.sales` AS t1   -- local
JOIN `project.us_dataset.sales` AS t2      -- remote (auto-transferred)
ON t1.product_id = t2.product_id
WHERE t1.date = '2026-03-01'               -- pushed down to both regions

IAM Permissions

Global Queries require two layers of setup.

Project-level opt-in (admin task)

-- Enable execution from the primary region
ALTER PROJECT `your-project-id`
SET OPTIONS (
  `region-asia-northeast1.enable_global_queries_execution` = true
);

-- Enable data access from the remote region
ALTER PROJECT `your-project-id`
SET OPTIONS (
  `region-us-central1.enable_global_queries_data_access` = true
);

User-level permissions

Role	Description
`bigquery.jobs.createGlobalQuery`	Required to initiate a global query. Currently only included in `roles/bigquery.admin` — create a custom role for regular users.
`roles/bigquery.dataViewer`	Required on every dataset being referenced, in every region.

Cost Structure

Global queries have three billing components instead of the usual one:

Component	Details	Approximate Price (2026)
Compute	Bytes scanned across all regions	$6.25 / 1 TB (on-demand)
Egress	Data transferred from remote to primary region	~$0.08–$0.12 / 1 GB (intercontinental)
Temporary Storage	Intermediate data stored for up to 8 hours	~$0.02/GB-month (prorated)

Cost simulation

Scenario: Query from Tokyo, scanning a 1 TB table in us-central1, with a WHERE clause that reduces the data transferred to 1 GB.

Compute: 1 TB × $6.25 = $6.25
Egress: 1 GB × $0.12 = $0.12
Total: ~$6.37

If you skip the WHERE clause and transfer the full 1 TB: egress alone exceeds $100. Pushdown is not optional — it's the entire cost model.

Dry run before executing

Use the BigQuery Console (it shows estimated bytes scanned before you click Run) or the CLI:

bq query --dry_run --use_legacy_sql=false 'SELECT ...'

Note: As of the current preview, dry runs may not accurately estimate egress (only compute bytes). Budget conservatively.

Key Considerations

Latency

Cross-region queries are always slower than single-region queries. Physical distance adds hundreds of milliseconds of network latency, plus multi-region orchestration overhead. Expect a minimum of 5–10 seconds even for modest cross-region joins. Real-time dashboards are not a good fit.

Data Residency

The Primary Region is where remote data lands temporarily. If GDPR or local privacy laws prohibit data from Region A leaving Region A, you must run the query from Region A as the primary — not from a region outside it. VPC Service Controls perimeters are also respected.

Current Limitations (Preview, March 2026)

No Query Cache

Global queries never use the query cache. Since data can change in any remote region at any time, BigQuery always reads fresh data. Every execution incurs full compute and egress costs.

Workaround: For frequently-used cross-region joins, materialize results into a local table using CREATE TABLE AS SELECT and query that instead.

No INFORMATION_SCHEMA from Remote Regions

You cannot query INFORMATION_SCHEMA views from a remote region within a global query. Joining metadata across regions requires first exporting that metadata into regular tables.

Unsupported Table Types

BigLake Apache Iceberg tables in remote regions are not supported as remote sources.
Partition pseudo-columns (_PARTITIONTIME, _PARTITIONDATE) may not pushdown correctly (more on this below).

No Sandbox Support

Billing Account required. The Sandbox (free tier) does not support Global Queries because egress charges can exceed the free quota.

The Partition Pseudo-Column Trap

This is the most dangerous limitation in production, and deserves its own section.

Background: Pseudo-columns vs. Physical columns

BigQuery offers two partitioning strategies:

Type	Partition Key	Access
Ingestion-time partitioned	Arrival timestamp, managed by BigQuery	Via `_PARTITIONTIME` / `_PARTITIONDATE` (pseudo-columns)
Column-based partitioned	An actual column in your table schema (e.g., `event_date`)	Via the column name directly

Pseudo-columns are not part of the formal table schema. They're metadata-level constructs.

Why pushdown fails for pseudo-columns

When the Query Optimizer sends execution instructions to a remote region, it works from the table's schema definition. Pseudo-columns aren't in that definition, so the optimizer can't reliably communicate partition pruning constraints to the remote worker.

Worst case: A filter like WHERE _PARTITIONDATE = '2026-03-01' is silently ignored in the remote region. The remote worker scans the entire table across all partitions and begins transferring everything to the primary region. Your query either times out or generates a very large bill.

The fix: Migrate to column-based partitioning

-- Create a new table with an explicit physical partition column
CREATE TABLE `project.dataset.new_table`
PARTITION BY event_date
AS
SELECT
  *,
  CAST(_PARTITIONDATE AS DATE) AS event_date  -- materialize the pseudo-column
FROM `project.dataset.old_table`

With a physical column, the optimizer sees it in the schema, understands the partition structure, and confidently applies pushdown in the remote region.

Workaround B: Aliasing via Views (use with caution)

If migrating the table isn't possible, you can create a view in the remote region that aliases the pseudo-column:

-- View in us-central1
CREATE VIEW `project.us_dataset.v_sales` AS
SELECT
  *,
  _PARTITIONDATE AS partition_date_col
FROM `project.us_dataset.ingestion_time_partitioned_table`

Then query the view from the primary region:

SELECT * FROM `project.us_dataset.v_sales`
WHERE partition_date_col = '2026-03-01'

This sometimes works for simple queries, but pushdown is not guaranteed. In complex queries with JOINs or aggregations, the optimizer often loses the connection between the aliased column and the underlying partition structure, falls back to full-scan, and transfers everything.

Always verify that pushdown is working by checking the Query Execution Plan and confirming the remote READ stage shows filtered row counts — not the full table row count.

Operational Best Practices

Problem	Recommendation
No query cache	Materialize frequent cross-region joins into local intermediate tables
Need metadata across regions	Export metadata to regular tables on a schedule
Ingestion-time partitioned tables	Migrate to column-based partitioning before using as remote sources
Unclear cost pre-execution	Use dry run + estimate egress separately; add a buffer

Summary

BigQuery Global Queries is a genuinely useful feature that eliminates an entire category of ETL pipelines. The execution model is well-designed — pushdown at the predicate, projection, and aggregation levels means you're typically only transferring the data you actually need.

The key things to internalize:

Pushdown is the cost model. Filter early, select only the columns you need, push aggregations to the remote side.
Ingestion-time partitioned tables are a liability in global queries. Migrate to column-based partitioning.
It's Preview — no query cache, no INFORMATION_SCHEMA cross-region, no BigLake Iceberg remotes. Design your architecture around these constraints.

Check the official documentation for the latest changes as this feature moves toward GA.

The Cloud is No Longer Virtual: The Harsh Physical Reality of AI Infra in 2026

Hiroshi Toyama — Mon, 23 Feb 2026 08:42:02 +0000

TL;DR

The "Virtual" in Cloud is fading. In 2026, AI infrastructure is dominated by three physical constraints: power grid capacity, tax legislations, and liquid cooling. If you are still picking regions based solely on latency, you are overpaying by at least 20%.

1. The Death of the "Sales Tax Holiday"

For a decade, states like Virginia attracted data centers with massive sales tax exemptions. That era ended in February 2026 with Virginia HB 897.

Why this matters for your bill:

In the US, "Sales Tax" works differently from Japan's VAT or Europe's VAT. It is a sunken cost with no tax credit for businesses. When a state removes a 6-10% tax exemption on hardware:

An NVIDIA B200 cluster worth $100M suddenly costs $110M.
This extra Capex is directly passed to you as higher hourly instance rates.

The Move: We are seeing a "Great Migration" to the Midwest AI Belt (Indiana, Ohio, Iowa), where 20-30 year tax holidays are still guaranteed.

2. Why "Power" is the New "Latency"

We used to care about milliseconds. Now, we care about Megawatts.

The Virginia Gridlock

In North Virginia (us-east-1), data centers now consume over 25% of the total state power. The grid is saturated. To build new AI capacity, AWS and Google are now forced to become Energy Producers.

Nuclear is the New "Default Gateway"

SMRs (Small Modular Reactors): AWS is deploying SMRs as "Microservices for Energy"—factory-built reactors that can be dropped next to a data center.
Direct-to-Plant: Microsoft and Azure are restarting decommissioned plants (like Three Mile Island) just to keep their GPUs humming.

3. The "Jevons Paradox" of NVIDIA GPUs

People often ask: "Why doesn't NVIDIA make low-power GPUs?"

The answer is Tokens per Watt. NVIDIA's Blackwell (B200) consumes a massive 1,200W, but it is 25x more efficient at generating tokens than the previous generation.

The Thermal Wall

Because one rack now pulls 120kW+, traditional air cooling is dead. 2026 is the year of Liquid Cooling. If your DC doesn't have pipes, it can't run the latest AI models. This creates a "Performance Gap" between old regions and new AI-native regions.

4. The Tokyo Context: Why so expensive?

Many Japanese developers wonder why ap-northeast-1 costs more than us-east-1 despite Japan's "cheaper" cost of living.

Imported Energy: Japan's industrial electricity is 2-3x more expensive than the US.
Dollar-Denominated Silicon: Everything from the GPU to the fuel for the power plant is priced in USD. The weak Yen makes these "imported" cloud resources luxury items.
Humidity: Tokyo’s humid summers make PUE (Power Usage Effectiveness) worse than the dry, flat plains of Ohio.

5. FinOps 2026: Actions for Engineers

"Turning off idle instances" is FinOps 101. To be a Senior Infrastructure Engineer in 2026, you need Regional Arbitrage.

Move Training to the Midwest: Shift non-latency-sensitive training jobs from us-east-1 to us-west-2 (Oregon) or the new Indiana regions to save 10-15% on tax and power alone.
Use Token-Specific Hardware: Evaluate TPU v7 (Google Cloud) or Trainium 2 (AWS). In 2026, specialized ASICs are often 3x more cost-effective than general-purpose GPUs for specific LLM workloads.
Infrastructure as Code (IaC) for Regions: Don't hardcode regions. Use variables that allow you to follow the "Tax-Free Energy" across the globe.

Final Thoughts

The cloud is no longer an invisible layer of abstraction. It is a physical plant that breathes energy and exhales heat. The best engineers in 2026 will be those who understand the physics and economics behind the API call.

What are your thoughts? Are you planning to migrate your workloads out of Virginia? Let's discuss in the comments!

Track Your Azure OpenAI Costs in Seconds, Not Minutes

Hiroshi Toyama — Mon, 26 Jan 2026 12:32:19 +0000

If you're building AI applications with Azure OpenAI, you know the drill: costs can spiral fast. One experimental feature using o1-preview, a few hundred test runs, and suddenly your bill looks very different from last month.

The Azure portal shows you the numbers eventually, but when you're iterating quickly on AI features, you need real-time visibility. That's exactly what azurecost delivers - instant Azure OpenAI cost tracking from your terminal.

The Azure OpenAI Cost Challenge

Unlike traditional cloud services with predictable pricing, Azure OpenAI costs vary wildly based on:

Model choice (GPT-4o-mini vs o1-preview is a 5-15x difference)
Token usage (both prompt and completion tokens)
Deployment scaling and throughput
Testing and development cycles

The questions you need answered daily:

"How much did my o1-preview deployment cost yesterday?"
"Which resource group is burning through credits?"
"Did that new feature spike my OpenAI spend?"
"How does dev environment cost compare to production?"

Checking this through the Azure portal means multiple clicks, page loads, and waiting. When you're checking costs multiple times a day during active development, this friction adds up.

The Solution

azurecost is a Python CLI tool built specifically for developers who need fast answers about their Azure spending. For Azure OpenAI users, it's the fastest way to track Cognitive Services costs without touching the Azure portal.

Why It Works for Azure OpenAI

Instant visibility: See your OpenAI costs in 2 seconds, not 2 minutes
Daily granularity: Catch cost spikes the day they happen, not at month-end
Resource-level tracking: Monitor individual Azure OpenAI accounts separately
Resource group isolation: Separate dev, staging, and production costs effortlessly
Multi-dimensional views: Break down by service, location, resource group, or resource ID
Automation-ready: Python API for integrating into your CI/CD or daily reports
Auto-currency: Works with any billing currency

Getting Started

Installation takes one line:

pip install azurecost

az login

Check your Azure OpenAI costs:

azurecost -s your-subscription-name

Output looks like this:

(USD)                 2025-11    2025-12
------------------  ---------  ---------
total                 1247.83    2891.45
Cognitive Services    1247.83    2891.45

That's your Azure OpenAI spend right there - Cognitive Services is the billing category for Azure OpenAI. Notice the spike in December? Now you can investigate what changed.

Real-World Azure OpenAI Use Cases

Scenario 1: Daily Cost Monitoring During Development

You're building a new reasoning agent with o1-preview. Check costs every morning:

azurecost -s prod-subscription -g DAILY -a 7

Output:

(USD)           2025-12-15  2025-12-16  2025-12-17  2025-12-18
------------  -----------  -----------  -----------  -----------
total                45.23        52.18       178.45        51.20
Cognitive Services   45.23        52.18       178.45        51.20

Whoa, December 17th spiked to $178. That's the day you started load testing with o1-preview. Now you know exactly when and how much it costs.

Scenario 2: Environment-Based Cost Breakdown

You have separate resource groups for dev, staging, and production. See costs side by side:

azurecost -s ai-subscription -d ResourceGroup -d ServiceName

Output:

(USD)                                        2025-11    2025-12
-----------------------------------------  ---------  ---------
total                                        1247.83    2891.45
ai-dev-rg/Cognitive Services                  342.15     456.32
ai-staging-rg/Cognitive Services              198.42     287.89
ai-prod-rg/Cognitive Services                 707.26    2147.24

Production jumped from $707 to $2147. Time to optimize those prompts or consider GPT-4o-mini for some use cases.

Scenario 3: Focused Investigation on Production

Something's wrong with production costs. Drill down to just that resource group:

azurecost -s ai-subscription -r ai-prod-rg -g DAILY -a 14

See two weeks of daily costs for production only. Spot the pattern, correlate with deployments or feature releases.

Scenario 4: Multi-Region Cost Analysis

Running Azure OpenAI deployments in multiple regions? Group by location:

azurecost -s global-ai-sub -d Location -d ServiceName

Output:

(USD)                                   2025-11    2025-12
------------------------------------  ---------  ---------
total                                   1247.83    2891.45
East US/Cognitive Services               823.14    1923.87
West Europe/Cognitive Services           424.69     967.58

East US is handling most of the load. Maybe redistribute traffic or consider regional pricing differences.

Scenario 5: Resource-Level Cost Analysis

Running multiple Azure OpenAI accounts for different teams or use cases? Track costs at the individual resource level:

azurecost -s ai-subscription -d ResourceId

Output:

(USD)                                                                                     2025-12    2026-01
--------------------------------------------------------------------------------------  ---------  ---------
total                                                                                     5741.44   16571.60
/resourcegroups/ai/providers/microsoft.cognitiveservices/accounts/chatbot-prod           3401.80   16390.44
/resourcegroups/ai/providers/microsoft.cognitiveservices/accounts/analytics-engine       2194.17     131.82
/resourcegroups/ai/providers/microsoft.cognitiveservices/accounts/internal-tools          145.47      49.34

This shows exactly which Azure OpenAI account is consuming credits. Perfect for:

Cost attribution: Charge back costs to specific teams or projects
Identifying cost anomalies: Spot which deployment suddenly increased spend
Multi-tenant environments: Track costs per customer or tenant
Budget allocation: Distribute budget based on actual usage patterns

Combine with daily granularity to investigate when a specific resource started costing more:

azurecost -s ai-subscription -d ResourceId -g DAILY -a 7

Why This Matters for Azure OpenAI Users

Building with Azure OpenAI is different from traditional cloud infrastructure. With VMs or databases, costs are fairly predictable. But with modern language models like GPT-4o and o1-preview, costs depend on how users interact with your application:

Long conversations = more tokens = higher costs
Complex reasoning tasks = more input tokens
Detailed responses = more completion tokens
Testing and iteration = multiplied costs

During active development, you need to check costs frequently. Not once a month when the bill arrives, but daily or even multiple times a day.

The Azure portal workflow kills this feedback loop:

Open browser
Navigate to Azure portal (wait for load)
Find the right subscription
Click through to Cost Management
Configure time range and filters
Wait for data to render

By the time you see the numbers, you've burned 2-3 minutes. When you're doing this multiple times daily, the friction discourages you from checking at all.

Then you're surprised at month-end when the bill is 3x what you expected.

azurecost fixes this. Checking costs becomes as fast as checking git status:

azurecost -s ai-subscription -g DAILY -a 7

Two seconds. Real-time feedback. No context switching.

This speed changes behavior. When checking costs is instant, you actually do it. You catch issues early. You experiment with confidence because you're monitoring the impact.

The tool emerged from my own need while building AI features. I was spending too much time in the portal doing the same query repeatedly. I wanted something terminal-based that integrated into my development workflow.

What started as a personal script evolved into a proper tool that my team adopted, then others in the community found useful. The philosophy is simple: do one thing well - show Azure costs fast and clearly.

Automate Azure OpenAI Cost Monitoring

The Python API lets you integrate cost tracking into your workflows. Send daily Azure OpenAI cost reports to Slack:

from azurecost import Azurecost
import requests
import os

# Get yesterday's costs
core = Azurecost(
    debug=False,
    granularity="DAILY",
    dimensions=["ResourceGroup", "ServiceName"],
    subscription_name="ai-production"
)

total_results, results = core.get_usage(ago=7)
text = core.convert_tabulate(total_results, results)

# Send to Slack
webhook_url = os.getenv("SLACK_WEBHOOK_URL")
message = f"🤖 Azure OpenAI Cost Report (Last 7 Days)\n```
{% endraw %}
\n{text}\n
{% raw %}
```"
requests.post(webhook_url, json={"text": message})

Run this as a daily cron job or GitHub Action. Your team gets automatic cost visibility without anyone checking the portal.

Other use cases:

Budget alerts: Trigger warnings when daily costs exceed thresholds
Cost attribution: Track which team or feature is using OpenAI credits
Pre-deployment checks: Validate costs before promoting to production
Finance automation: Generate reports for accounting without manual exports

Configuration Tips for AI Workloads

Set environment variables to streamline your daily checks:

# Add to your ~/.zshrc or ~/.bashrc
export AZURE_SUBSCRIPTION_ID=your-ai-subscription-id
export AZURE_RESOURCE_GROUP=ai-prod-rg

Now run azurecost with no arguments:

azurecost -g DAILY -a 7

Create shell aliases for common queries:

# Daily OpenAI costs
alias aicosts='azurecost -s ai-subscription -g DAILY -a 7'

# Production environment only
alias prodcosts='azurecost -s ai-subscription -r ai-prod-rg -g DAILY -a 7'

# Compare all environments
alias envcosts='azurecost -s ai-subscription -d ResourceGroup'

# Resource-level breakdown
alias resourcecosts='azurecost -s ai-subscription -d ResourceId'

Type aicosts each morning as part of your routine. Takes 2 seconds, keeps you informed. Use resourcecosts when you need to drill down to individual Azure OpenAI accounts.

Best Practices for Azure OpenAI Cost Management

Based on using azurecost with AI workloads, here are patterns that work:

1. Daily morning check

azurecost -s ai-sub -g DAILY -a 7

Catch anomalies before they compound. One day of unexpected costs is manageable. A month is not.

2. Before and after feature releases

# Before deployment
azurecost -s ai-sub -r ai-prod-rg -g DAILY -a 3

# Deploy new feature

# After 24 hours
azurecost -s ai-sub -r ai-prod-rg -g DAILY -a 3

Measure the cost impact of new features. Make data-driven decisions about o1-preview vs GPT-4o-mini.

3. Set up automated alerts
Use the Python API to send daily reports. Don't rely on remembering to check manually.

4. Isolate environments
Use separate resource groups for dev, staging, production. Makes cost attribution trivial.

5. Monitor during load testing
Run azurecost before and after load tests. Understand your cost-per-request at scale before going live.

Start Tracking Your Azure OpenAI Costs Now

Install the tool:

pip install azurecost

Check your costs:

azurecost -s your-subscription -g DAILY -a 7

That's it. You now have instant visibility into your Azure OpenAI spending.

Check the GitHub repository for full documentation, API examples, and to report issues or contribute.

Building AI features is expensive enough. Don't let invisible costs surprise you. Make cost visibility effortless.

Quick Links

GitHub: toyama0919/azurecost
PyPI: azurecost
Python 3.8+ required

Building with Azure OpenAI? Drop a comment on how you're managing costs or share your use case. Always looking for feedback and ideas.

ty: The Blazingly Fast Python Type Checker from Astral (Ruff & uv Creators)

Hiroshi Toyama — Wed, 07 Jan 2026 04:44:52 +0000

ty: The Blazingly Fast Python Type Checker from Astral

Astral, the company behind popular Python tools like Ruff and uv, has released a new type checker called ty. Built in Rust, it promises to revolutionize Python type checking with incredible speed and powerful features.

Key Features

⚡ Blazingly Fast: 10-100x faster than mypy and Pyright
🔍 Rich Diagnostics: Detailed error messages with multi-file context
🧠 Smart Type Inference: Catches bugs even without type annotations
🎯 Advanced Type System: Intersection types, advanced narrowing, reachability analysis
🛠️ Language Server: Built-in support for IDE integration (VS Code extension available)

Installation

# Using uv (recommended)
uv tool install ty@latest

# Using pip
pip install ty

Basic Usage

# Check a file
ty check file.py

# Check directories
ty check src tests

# Verbose mode
ty check src --verbose

What Makes ty Special?

1. Speed That Matters

ty can type-check the entire Home Assistant project in ~2.19 seconds, compared to mypy's 45.66 seconds. That's over 20x faster!

For large codebases, this speed difference transforms the developer experience:

No more waiting for CI to finish
Instant feedback in your IDE
Practical to run on every save

2. Powerful Type Inference

Here's the magic: ty can find bugs even without type annotations. Let's look at some examples.

Real-World Bug Detection

Example 1: Type Mismatches

def add(x: int, y: int) -> int:
    return x + y

result: str = add(1, 2)  # Oops! Assigning int to str

ty's output:

error[invalid-assignment]: Object of type `int` is not assignable to `str`
 --> example.py:4:9
  |
4 | result: str = add(1, 2)
  |         ---   ^^^^^^^^^ Incompatible value of type `int`
  |         |
  |         Declared type

Example 2: Missing Attributes

class User:
    def __init__(self, name: str):
        self.name = name

    def get_email(self):
        return self.email  # email was never defined!

ty catches this:

error[unresolved-attribute]: Object of type `Self@get_email` has no attribute `email`
 --> example.py:6:16
  |
6 |         return self.email
  |                ^^^^^^^^^^

This works even for complex cases:

class DatabaseClient:
    def __init__(self, connection_string: str):
        self.connection = connect(connection_string)

    def query(self, sql: str):
        # Typo: should be .execute(), not .exec()
        return self.connection.exec(sql)

ty will warn you that .exec() doesn't exist on the connection object!

Example 3: Incorrect Function Arguments

numbers: list[int] = [1, 2, 3]
numbers.append("four")  # String in int list!

ty's error:

error[invalid-argument-type]: Argument to bound method `append` is incorrect
 --> example.py:2:16
  |
2 | numbers.append("four")
  |                ^^^^^^ Expected `int`, found `Literal["four"]`

Example 4: Null Safety

def process(value: str | None) -> int:
    return len(value)  # What if value is None?

ty spots the issue:

error[invalid-argument-type]: Expected `Sized`, found `str | None`
 --> example.py:2:16
  |
2 |     return len(value)
  |                ^^^^^
  |
info: Element `None` of this union is not assignable to `Sized`

Example 5: Implicit None Returns

def get_name(user_id: int) -> str:
    if user_id > 0:
        return "User"
    # Missing return statement - implicit None!

ty catches this too:

error[invalid-return-type]: Function can implicitly return `None`,
which is not assignable to return type `str`

Configuration

ty uses pyproject.toml for configuration:

[tool.ty]
[tool.ty.environment]
python-version = "3.10"

[tool.ty.src]
include = ["src/**/*.py", "tests/**/*.py"]
exclude = [".tox/**", "build/**", "dist/**"]

# Per-directory overrides
[[tool.ty.overrides]]
include = ["tests/**/*.py"]
[tool.ty.overrides.rules]
invalid-assignment = "ignore"  # More lenient for tests

CI/CD Integration

GitHub Actions

name: Type Check
on: [push, pull_request]

jobs:
  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v5
        with:
          python-version: '3.10'
      - name: Install dependencies
        run: pip install ty
      - name: Run type checker
        run: ty check src tests

tox Integration

[testenv:ty]
deps =
    ty
    -e .
commands =
    ty check src tests

pre-commit Hook

repos:
  - repo: local
    hooks:
      - id: ty
        name: ty type checker
        entry: ty check src tests
        language: system
        pass_filenames: false
        types: [python]

Editor Integration

VS Code

Install the "ty" extension from VS Code Marketplace
Enjoy automatic:
- Type checking as you type
- Inlay hints for inferred types
- Code actions and quick fixes
- Jump to definition with type awareness

ty vs mypy vs Pyright

Feature	ty	mypy	Pyright
Speed	⚡⚡⚡ Ultra-fast	🐢 Moderate	🚀 Fast
Language	Rust	Python	TypeScript
Error Messages	📝 Very detailed	📄 Standard	📝 Detailed
Type Inference	🧠 Powerful	📚 Standard	🧠 Powerful
Maturity	🆕 Beta	✅ Stable	✅ Stable
Diagnostics	🔍 Multi-file context	📊 Single-file	🔍 Cross-file
Language Server	✅ Built-in	⚠️ Via dmypy	✅ Built-in

Current Limitations

As of v0.0.9 (Beta):

Limited rule set: Main rules implemented, more coming in 2026
No strict mode yet: Won't complain about missing type annotations
Beta status: API may change before stable release

However, even in beta, ty is already catching real bugs and providing value in production codebases.

Getting Started with ty

Here's a practical workflow for adopting ty in your project:

Step 1: Install and Run

uv tool install ty@latest
ty check src

Step 2: Review Results

ty will show you actual bugs in your code, even without type annotations!

Step 3: Add to CI

# Add to your test script
tox -e ty

# Or run directly
ty check src tests

Step 4: Gradual Type Annotation

As you maintain your code, gradually add type annotations where it makes sense:

# Before
def calculate_total(items):
    return sum(item.price for item in items)

# After
def calculate_total(items: list[Item]) -> Decimal:
    return sum(item.price for item in items)

With annotations, ty becomes even more powerful!

Real-World Example: FastAPI Application

from fastapi import FastAPI, HTTPException

app = FastAPI()

class UserService:
    def __init__(self, db_url: str):
        self.db = connect_database(db_url)

    def get_user(self, user_id: int):
        user = self.db.query(User).filter(User.id == user_id).first()
        if user:
            return user
        # Bug: FastAPI expects HTTPException, but we're returning None!
        return None

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    service = UserService(DB_URL)
    user = service.get_user(user_id)
    # Bug: user could be None, but we're accessing .dict()
    return user.dict()

ty will catch both bugs:

get_user can return None, but FastAPI expects an exception or a valid user
user.dict() could fail if user is None

Performance Tips

Use file exclusion: Skip generated files and vendor directories
Enable caching: ty caches results for unchanged files
Parallel processing: ty automatically uses multiple cores
Incremental mode: In your editor, only changed files are re-checked

Roadmap

According to Astral, ty is targeting a stable 1.0 release in 2026, with plans for:

✅ More comprehensive rule coverage
✅ Strict mode (enforce type annotations)
✅ Better integration with popular frameworks
✅ Plugin system for custom rules
✅ Performance optimizations

Conclusion

ty represents the next evolution in Python type checking. Its combination of blazing speed, powerful inference, and excellent diagnostics makes it a compelling choice for Python developers.

While still in beta, ty is already proving valuable for catching real bugs and improving code quality. If you're using mypy or Pyright and frustrated by slow check times, ty is worth trying.

The Astral team has an excellent track record with Ruff and uv, and ty looks set to be another game-changer in the Python tooling ecosystem.

Resources

🌐 Official Website: astral.sh/ty
📚 Documentation: docs.astral.sh/ty
💻 GitHub: github.com/astral-sh/ty
💬 Discord: Astral Discord

Have you tried ty yet? What's your experience with Python type checkers? Let me know in the comments! 👇

Introducing azurecost: Azure Cost Management Made Simple

Hiroshi Toyama — Tue, 16 Dec 2025 12:35:15 +0000

If you've ever struggled to understand your Azure costs through the portal's cluttered interface, you're not alone. That's why I created azurecost - a straightforward command-line tool that gives you instant visibility into your Azure spending.

The Problem

Azure's Cost Management portal is powerful, but sometimes you just want quick answers:

"How much did I spend last month?"
"Which services are costing me the most?"
"What's the cost breakdown by resource group?"

Navigating through the Azure portal for these simple questions feels like overkill. You need something faster.

The Solution

azurecost is a Python CLI tool that brings Azure cost data directly to your terminal. No clicking through menus, no waiting for the portal to load - just type a command and get your answer.

Key Features

Simple by default: Just run azurecost and see your monthly costs by service
Flexible dimensions: Group costs by resource group, service name, location, or any combination
Multiple time ranges: View daily or monthly costs for any time period
Auto-currency detection: Displays costs in your subscription's billing currency
Python API: Use it programmatically in your scripts and automation

Getting Started

Installation is one line:

pip install azurecost

Make sure you're logged in with Azure CLI:

az login

Then run it:

azurecost -s my-subscription

You'll see something like this:

(JPY)                 2023-08    2023-09
------------------  ---------  ---------
total                  492.77      80.28
Cognitive Services     492.77      80.28
Bandwidth                0          0
Storage                  0          0

Clean, readable, instant.

Real-World Examples

Multi-dimensional Analysis

Want to see costs broken down by both resource group and service? Easy:

azurecost -s my-subscription -d ResourceGroup -d ServiceName

Daily Monitoring

Tracking costs during a migration or deployment? Switch to daily granularity:

azurecost -s my-subscription -g DAILY -a 7

This shows you the last 7 days of costs, broken down by day.

Focused Investigation

Need to check costs for a specific resource group?

azurecost -s my-subscription -r production-rg -a 3

Shows 3 months of costs for just that resource group.

Why I Built This

The idea for azurecost came from a recurring frustration in my daily work as a cloud engineer. Every morning, I wanted to do a quick cost check - nothing fancy, just a sanity check to make sure nothing was unexpectedly spiking. But this simple task meant:

Opening a browser
Navigating to the Azure portal (and waiting for it to load)
Finding the right subscription
Clicking through to Cost Management
Waiting for the cost analysis page to render
Configuring the time range and dimensions I wanted
Waiting again for the data to load

By the time I got my answer, 2-3 minutes had passed. Multiply that by every time someone on the team needed to check costs, and we're talking about hours of wasted time per week.

I realized I was doing the same exact query every time. I didn't need the portal's full power - I needed something as simple as running git status or docker ps. Something that gave me the information I needed in 2 seconds, not 2 minutes.

Then there were the times when I was SSH'd into a server or working in a restricted environment where opening a browser wasn't convenient. Or when I wanted to pipe cost data into a script for automated reporting. The portal simply wasn't designed for these workflows.

So I built azurecost. The goal was simple: make checking Azure costs feel as natural as any other command-line operation. No ceremony, no waiting, no clicking.

What started as a personal productivity hack turned into something my team adopted, then colleagues at other companies started using it, and eventually it made sense to release it as an open-source tool.

The philosophy behind azurecost is that most of the time, you don't need 100 different options and visualizations. You need a fast, reliable answer to a simple question. The tool does one thing well: show you your Azure costs quickly and clearly.

Using It in Your Scripts

The Python API makes it easy to integrate cost monitoring into your automation:

from azurecost import Azurecost

core = Azurecost(
    debug=False,
    granularity="DAILY",
    dimensions=["ServiceName", "ResourceGroup"],
    subscription_name="my-subscription"
)

total_results, results = core.get_usage(ago=7)
text = core.convert_tabulate(total_results, results)
print(text)

Perfect for:

Daily cost reports sent to Slack
Budget alerts in your CI/CD pipeline
Cost attribution in team dashboards
Custom reporting for finance teams

Configuration Tips

Set environment variables to avoid typing the same options repeatedly:

export AZURE_SUBSCRIPTION_ID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AZURE_RESOURCE_GROUP=production-rg

Now just run azurecost with no arguments. It'll use your configured subscription and resource group automatically.

What's Next

The tool is actively maintained and open to contributions. Some ideas I'm exploring:

Cost anomaly detection
Budget tracking and alerts
Cost forecasting based on historical trends
Integration with other cloud providers

Try It Out

The tool is open source and available on PyPI. Give it a try:

pip install azurecost
azurecost -s your-subscription

Check out the GitHub repository for full documentation, examples, and to report issues or contribute.

Azure cost management doesn't have to be complicated. Sometimes the best tool is the simplest one.

Links

🔗 GitHub: toyama0919/azurecost
📦 PyPI: azurecost
🐍 Python: 3.8+

Have questions or feedback? Open an issue on GitHub or drop a comment below!

Forem: Hiroshi Toyama

Cursor vs Claude: The Business Models Behind the 10x Price Gap

The Two Business Models

The Cache Write Tax

Luxury Engineering

Full Model Landscape

Claude Code: The Name Is Misleading

What This Means for the Future

The Rational Selection Framework

Using llms.txt with Cursor and Claude Code: a concrete playbook

What goes where

One-time setup: a dedicated rules file

Agent protocol (copy into the same file or AGENTS.md)

Concrete workflows

Prompts that reinforce good habits

Minimal internal llms.txt (optional)

Tooling reality check

Anti-patterns

SEO note (short)

Summary

Cursor Composer 2: The Cache Economy Behind a 10x Cheaper Coding Agent

Why a Specialized Model at All

Standard vs Fast: Same Weights, Different Queue

The Cache Read Economy

The Cache Bug (March–April 2026)

Comparing with Claude Code's Cache

My Usage Data

The Actual Decision

Two Nasty Gotchas When Building Multi-Agent Systems with Google ADK

The Setup

Bug 1: Agent already has a parent agent on module reload

The error

What's happening

The fix

Bug 2: Context variable not found in instruction strings

The error

What's happening

The fix

Summary

Managing AI Agent Skills with `npx skills`: A Practical Guide

The Problem

How It Works

GitHub as the Registry

SKILL.md as the Package Entry Point

.skills.json + skills-lock.json = package.json + package-lock.json

Key Commands

Gotchas

remove Doesn't Update the Lock File

experimental_ Prefix is Real

Cache Behavior with npx

CI/CD Integration

Creating Your Own Skill

Summary

Deploying a Google ADK Agent to Vertex AI Agent Engine with Terraform

Prerequisites

How Agent Engine Deployment Works

The Agent Entrypoint

Building the Source Archive

Terraform Configuration

provider.tf

Wiring the Archive Build

The Agent Engine Resource

Block Reference

class_methods in Detail

Gotchas

Deploy and Verify

Summary

AI-Driven Chrome Extension Development with WXT and Chrome DevTools MCP

The Problem

The Stack

Step 1: WXT with a Fixed CDP Port

Step 2: Chrome DevTools MCP Configuration

Step 3: The Dev Script (Optional but Recommended)

The Workflow in Practice

1. Start the environment

2. AI inspects current state

3. AI verifies changes through the browser

Tips and Gotchas

Content Script Isolation

Selector Strategy for Third-Party UIs

Bug 1: `Agent already has a parent agent` on module reload

Bug 2: `Context variable not found` in instruction strings

`remove` Doesn't Update the Lock File

`experimental_` Prefix is Real

`provider.tf`

`class_methods` in Detail

`--user-data-dir` and Security

2. Add CDP port to `wxt.config.ts`

3. Create `.cursor/mcp.json`